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PREFACE 



This publication describes the design of 
the Machine-Check Handler (MCH) program and 
what it does to prevent or minimize down- 
time for System/ 370 Models 135 and 145. 



The "Diagnostic Aids" section describes 
several techniques that can be used to 
determine the source of problems that arise 
in MCH. 



ORGANIZATION OF THIS MANUAL 

The "Introduction" summarizes the opera- 
tion of MCH. This section contains defini- 
tions and descriptions needed to understand 
the second section "Method of Operation." 

The "Method of Operation" describes the 
functions of the program and shows how the 
major data areas are used by MCH. 

The "Program Organization" section 
describes the modules that constitute MCH 
and the operation of each of these modules. 
Flowcharts of each module are provided at 
the end of this section. 

"MCH Data Areas" describes the fields of 
information used by MCH in its principal 
data area, the MCH Common Area. 



The "MCH Module Directory" section is a 
guide to the named areas of code in the 
program listing. 



The appendixes contain a table showing 
where MCH messages originate, a detailed 
description of the machine- check interrup- 
tion code, and the MCH wait state codes. 
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SECTION 1 : INTRODUCTION 



This publication describes the opera- 
tions of the Machine-Check Handler (MCH) 
program for the IBM System/370 Models 135 
and 145. The Machine-Check Handler for the 
Model 135 is a standard component of the 
MFT version of the System/360 Operating 
System. The Machine-Check Handler for the 
Model 145 is a standard component of both 
the MFT and MVT versions of the System/360 
Operating System. The purpose of the 
Machine-Check Handler is to minimize the 
effects of machine malfunctions on jobs in 
process. MCH does this, on the occurrence 
of a machine- check interruption, by 
attempting to correct the malfunction and 
by producing diagnostic records and mes- 
sages to help system maintenance personnel 
find the cause of the problem. See Figure 
1 for an overview of the Machine- Check 
Handler. 



When a machine- check interruption 
occurs, MCH immediately gets control. If a 
soft machine-check interruption occurs, MCH 
records information about the malfunction. 
If a hard machine-check interruption 
occurs, in addition to recording informa- 
tion about the malfunction, MCH attempts to 
shield the operating system from the 
adverse effects of the malfunction. 

The machine logs out information 
describing the cause of the malfunction and 
the status of the system at the time of the 
interruption. This information is used by 
the Machine-Check Handler to carry out its 
recovery and recording operations. 



HARDWARE RECOVERY FEATURES OF THE MODELS 
135 AND 145 



RECOVERY DESIGN OF THE MODELS 135 AND 145 

Machine malfunctions originate in the 
CPU, main storage, and control storage. 
When one of these fails, hardware facili- 
ties attempt to correct the malfunction. 
CPU malfunctions are corrected through 
microprogram routines which retry the fail- 
ing operation. This is called CPU retry. 
Malfunctions in main and control storage 
are corrected by Error Checking and Correc- 
tion (ECC) . These two recovery features 
are described in more detail later in this 
section. 



The operation of the Machine-Check 
Handler depends on certain recovery actions 
taken by the hardware. It also depends on 
information given to it by the hardware. 
Some of the features of the hardware are 
described here. 



AUTOMATIC RECOVERY FEATURES 

The Models 135 and 145 have two ■"built- 
in" methods of recovering from machine mal- 
functions: CPU retry and ECC. Whenever 
circumstances permit, these two hardware 
features recover from machine malfunctions 
without assistance from the software. 



CPU retry and ECC are not always suc- 
cessful in their attempts to correct a mal- 
function. For this reason there are two 
types of machine-check interruptions. A 
"soft" machine-check interruption (some- 
times called a recovery report) is 
generated when: 

• CPU retry has corrected the 
malfunction, 

• ECC has corrected the malfunction 
(Model 145 only), or 

• ECC has encountered a solid, single-bit 
error that has reached an Error Fre- 
quency Limit such that ECC is correct- 
ing the error 256 times within 416 
micro-seconds (Model 135 only) . 

A "hard" machine-check interruption (some- 
times called a damage report) is generated 
when the malfunction has not been 
corrected. 



CPU Retry 

CPU errors are automatically retried by 
microprogram routines. These routines save 
source data before it is altered by the 
operation. When an error is detected, a 
microprogram routine returns the CPU to the 
beginning of the operation, or to a point 
where the operation was executing correct- 
ly, and the operation is repeated. After 
eight unsuccessful retries, the error is 
considered permanent. 

The CPU retry feature allows the machine 
to recover from temporary CPU failures that 
would otherwise make it necessary to reload 
the operating system or terminate the 
executing program. 

After each successful use of CPU retry, 
there is a soft machine-check interruption 
unless CPU retry is in quiet mode. After 
eight unsuccessful retries, there is a hard 
machine- check interruption . 
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Figure 1. Machine-check handler overview 



ECC Validity Checking 



Extended Logout Area 



ECC checks the validity of data from 
main and control storage, automatically 
correcting single-bit errors. It also 
detects multiple-bit errors but does not 
correct them. 



Data enters and leaves storage through a 
storage adapter unit. This unit makes the 
ECC validity check on each doubleword by 
insuring that the doubleword contains the 
appropriate parity bit for each byte. If a 
single-bit error is detected, the erroneous 
bit is corrected. The corrected doubleword 
is then sent back into main or control 
storage and on to the CPU. MCH is notified 
by a machine- check interruption and re- 
trieves the failing storage address from 
the fixed logout. Note that with MCH for 
the Model 135, the threshold of such soft 
machine checks must be exceeded before a 
machine- check interruption occurs. 

When a multiple-bit storage error is 
detected, a machine- check interruption is 
generated, and the error location is placed 
in the fixed logout. MCH gains control and 
attempts to recover from the error. 



FIXED STORAGE LOCATIONS 

There are four fixed storage locations 
in the Models 135 and 145: the fixed area 
in decimal locations 0-127, the I/O com- 
munications area in locations 160-191, the 
fixed logout area in locations 232-511, and 
the extended logout area. On the Model 
135, the extended logout is a 14 byte field 
contained within the fixed logout area at 
location 256. On the Model 145, the 
extended logout is in locations 512-703, 
unless the pointer to the Model 145 logout 
area (control register 15) specifies 
otherwise. 



Fixed Logout Area 

Data is put into the fixed logout area 
(232-511) when any type of machine-check 
interruption occurs. The data stored is 
processed by the Machine-Check Handler. 
The layout of this area is model indepen- 
dent among the System/370 models; however, 
all models do not use every field in the 
fixed logout. The fixed logout area con- 
tains the machine- check interruption code 
which indicates the reason for the inter- 
ruption. Other fields in the area preserve 
the status of the system at the time of the 
machine- check interruption and the contents 
of the general purpose, floating point, and 
control registers. 



The extended logout area contains data 
that is model- dependent. On the Model 145, 
the extended logout begins at the address 
specified in control register 15 and is a 
maximum length of 192 bytes long. Control 
register 15 is set to point to decimal 
location 512 by the hardware during IPL or 
system reset. 

On the Model 135, the extended logout is 
contained in decimal locations 256 through 
269 (an area within the fixed logout). If 
the extended logout mask bit in control 
register 14 is enabled for logouts, data is 
logged into the extended logout area for 
all types of machine-check interruptions. 
This data is recorded by MCH in the SYS1. 
LOGREC data set. 



CONTROL REGISTERS 

Two control registers are used by MCH 
for loading and storing control 
information. 

Control register 14 contains mask bits 
which specify whether certain conditions 
can cause machine- check interruptions and 
mask bits which control conditions under 
which an extended logout may occur. 

Control register 15, used only on the 
Model 145, contains the address of the 
extended logout area. 

The control registers are referred to by 
MCH through the use of two privileged 
instructions; LOAD CONTROL and STORE CON- 
TROL. LOAD CONTROL furnishes a means of 
loading control information from main 
storage to control registers; STORE CONTROL 
permits information to be transferred from 
control registers to main storage. 

The publication IBM System/370 Prin- 
ciples of Operation , GA22-7000, contains a 
detailed description of the use of control 
registers . 



MODES OF RECOVERY OPERATION 

The type of recording done by MCH 
depends upon the current "mode" of the CPU, 
main storage, and control storage. There 
are three possible modes s quiet mode, 
recording mode, and threshold mode. 

In quiet mode, machine checks corrected 
by CPU retry or ECC do not cause machine- 
check interruptions. 

In recording mode, machine failures 
corrected by these features do cause inter- 
ruptions for recording purposes. 
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In threshold mode f a preset frequency of 
such errors must occur before a soft 
machine-check interruption occurs. Note 
that hard (uncorrected) machine failures 
always result in a machine-check interrup- 
tion regardless of mode. 

There is a MODE command that can be used 
to vary the current mode (see "Use of the 
MODE Commands" in this section). 



MODES OF RECOVERY OPERATION OF THE MODEL 
135 

The Model 135 can operate in either 
recording mode or quiet mode (see Figure 2), 

CPU Malfunctions 

When the CPU is in recording mode, a 



soft machine-check interruption occurs 
each time a machine malfunction is 
repaired by CPU retry. When 20 such 
soft machine checks have occurred, the 
Soft Machine-Check Handler will auto- 
matically switch the CPU from record- 
ing mode to quiet mode. 



Note s Main and control storage are 
switched automatically to quiet mode 
along with the CPU. See "Use of the 
Mode Commands" in this section. 

When the CPU is in quiet mode, no 
machine-check interruption is issued 
for a soft error. Switching from 
quiet mode to recording mode can be 
accomplished by issuing the MODE 
command • 



Recording 



Mode 



Quiet 



Error Location 
Main Storage 






Control Storage 






CPU 

Main Storage 
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♦Single-bit errors in control storage will generate an interruption only if the hard- 
ware specified threshold is exceeded. 

L 



Figure 



Modes of recovery operation 



Main and Control Storage 

In recording mode, a machine-check 
interruption occurs for each malfunc- 
tion except solid, single-bit errors 
that occur below a certain rate. If 
the rate (or frequency) of single- bit 
errors becomes too high, a soft 
machine- check interruption occurs and 
main and control storage are automat- 
ically switched to quiet mode by the 
Soft Machine-Check Handler. 



USE OF THE MODE COMMANDS 

The MODE command is an operator command 
used to switch between recording, quiet, 
and (145 only) threshold modes. The 135 
MODE command can also be used to display 
the current mode status. 

Mode Command for the Model 135 



The format of the Model 135 MODE command 



xs: 



Note ; The CPU is not automatically 
switched to quiet mode along with main 
and control storage. See "Use of the 
Mode Commands" in this section. 



In quiet mode, no soft machine-check 
interruptions occur. A switch from 
quiet to recording mode can be made by 
issuing the MODE command. 



MODES OF RECOVERY OPERATION OF THE MODEL 
145 



j Operation j Operand 



j MODE 

I 

I 



( STATUS 
HIR RECORD 
ECC RECORD 






J 



STATUS 

causes the current status of both CPU 
retry (HIR) and ECC to be displayed in 
a message (IGF053I). The message also 
contains the CPU retry current error 
count and error count threshold for 
soft machine checks. The response to 
the command MODE STATUS is: 



Three modes of operation for the Model 
145 are used; recording mode, quiet mode, 
and threshold mode. Depending on the 
source of the malfunction, one, two, or all 
three modes may apply (see Figure 2): 

CPU Malfunctions 

Only the recording mode applies to CPU 
operations. In this mode, a machine- 
check interruption occurs for each 
malfunction. 

Main Storage Malfunctions 

In recording mode, a machine- check 
interruption occurs for each malfunc- 
tion except solid, single bit errors. 
In quiet mode, only hard errors cause 
machine-check interruptions. Soft ECC 
errors do not cause interruptions when 
main storage is in quiet mode. 

Control Storage Malfunctions 

In recording mode, a machine- check 
interruption occurs for each malfunc- 
tion. In quiet mode, only hard errors 
generate machine- check interruptions; 
soft errors do not cause interrup- 
tions. In threshold mode, no inter- 
ruptions are generated for soft ECC 
errors unless a specified number of 
soft errors occur within a specified 
time. The frequency of errors that 
will be tolerated is preset by the 
hardware. When that frequency is 
exceeded, a machine-check interruption 
occurs, control storage is automatic- 
ally switched into quiet mode and a 
message is sent to the operator. 



IGF053I MODE STATUS-ECCj QUIET 

RECORD 

HIR QUIET 

RECORD 

count-nn THRESHOLD-nn 



When the current error count equals 
the error count threshold (20), the 
Soft Machine-Check Handler switches 
both CPU retry and ECC to quiet mode. 
ECC is automatically switched to quiet 
mode along with CPU retry because the 
bit used to mask off CPU retry record- 
ing mode (bit 4 in control register 
14) also masks off ECC recording mode. 

When an Error Frequency Limit (EFL) of 
256 single- bit error corrections 
within 416 micro- seconds has been 
reached, a soft ECC interruption 
occurs and the Soft Ma chine- Check 
Handler switches ECC (main and control 
storage) to quiet mode. CPU retry is 
not automatically switched to quiet 
mode along with ECC because ECC can be 
masked off independently with a DIAG- 
NOSE instruction. Notice that solid, 
single- bit error corrections do not 
cause machine-check interruptions 
unless they occur with a greater fre- 
quency than the Error Frequency Limit. 
Also note that only Control Storage is 
referenced frequently enough for the 
EFL to be exceeded. 

HIR RECORD 

causes the CPU retry feature to enter 
recording mode. When the command to 
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enter recording mode is issued, the 
CPU current error count is reset to 
zero. If the CPU retry feature is 
already in recording mode when this 
form of the command is issued, the 
current error count is still reset to 
zero. 

ECC RECORD 

causes the ECC feature to enter reco- 
rding mode. If this form of the MODE 
command is issued when CPU retry is in 
quiet mode, it is rejected as a com- 
mand error. One bit in control 
register 14, which is used to mask off 
CPU retry recording mode, also masks 
off ECC recording mode. Therefore, 
CPU retry must be in recording mode 
before ECC can be switched to record- 
ing mode. 

Mode Command for the Model 145 



checks corrected by CPU Retry or ECC 
do not cause machine- check 
interruptions . 



THRES 



The format of the Model 145 MODE command 



is: 



causes control storage to be set to 
threshold mode. This operand can only 
be used for control storage. When in 
threshold mode, a pre-specified number 
of soft errors must occur before a 
soft machine-check is issued and con- 
trol storage is automatically switched 
to quiet mode. Notice that solid, 
single-bit error corrections do not 
cause soft machine- check interruptions 
unless it is a solid, single-bit error 
in control storage that exceeds the 
preset threshold. 

WARNING : The Model 145 MODE command 
is intended for use only by IBM per- 
sonnel. Issuing the RECORD form of 
this command at the wrong time can 
cause significant degradation in 
performance. 



j Operation j Operand 
I- f 



MODE 



j (MAIN) (, RECORD) 
j { CNTR J ] , QUIET | 
| ( , THRES ) 

.± 



4 



MAIN 



causes main storage to be placed in 
the specified mode. 

CNTR 

causes control storage to be placed in 
the specified mode. Note that control 
storage is physically identical to 
main storage and that both are con- 
tained within the same unit. Main 
storage contains problem programs and 
control (supervisor) programs. Con- 
trol storage contains the basic 
instruction set. 

RECORD 

causes the specified storage to be set 
to recording mode. In recording mode, 
a machine- check interruption occurs 
for all machine errors (except solid, 
single- bit errors ) A whether they have 
been corrected or not. 



QUIET 



causes the specified storage to be set 
to quiet mode. In quiet mode, machine 



^-Solid, single-bit errors in Control 
Storage can sometimes reach such a fre- 
quency as to exceed a preset Error Fre- 
quency Limit. In such cases, a solid, 
single- bit error causes a machine-check 
interruption . 



MCH ERROR RECOVERY 

Recovery from a machine malfunction is 
handled by both hardware recovery facili- 
ties and the MCH program. MCH recovery can 
be classified into three categories: sys- 
tem recovery, system- supported restart, and 
system repair. These levels of error re- 
covery are discussed in the order in which 
they are attempted. 



SYSTEM RECOVERY 

When the hardware cannot recover the 
system from the machine check, system re- 
covery takes place. MCH attempts to keep 
the system working at the expense of the 
task in which the error appeared. The pro- 
cessing of the task containing the error is 
terminated either by normal methods of job 
termination (ABEND) or by marking the task 
nondispatchable. System recovery only 
takes place if the task in question is not 
critical to continued system operation. An 
error in a critical task would require a 
system-supported restart. 



SYSTEM-SUPPORTED RESTART 

System-supported restart (warm start) 
requires the operator to re-IPL the system. 
The operator is notified that a critical 
error has occurred and that system con- 
tinuation is impossible. This type of re- 
covery is used when system recovery has 
failed or has been judged impossible. 



SYSTEM REPAIR 

System repair takes place at the discre- 
tion of the operator. Usually, the opera- 
tor will have tried to recover by system- 
supported restart one or more times with no 
success. An example of this type of error 
is when a hard error occurs so frequently 
that system-supported restart would not be 
successful. System repair always requires 
the services of maintenance personnel. 



PHYSICAL CHARACTERISTICS 



MAIN STORAGE REQUIREMENTS 

The Machine-Check Handler operates 
within the MCH Resident Area. The MCH 
Resident Area, as shown in Figure 3, occu- 
pies 4.9K bytes in the fixed area of main 
storage. The MCH Resident Area is divided 
into three sections: the MCH Nucleus Area, 
the MCH Transient Area, and the MCH Common 
Area. 



MCH Nucleus Area ; The MCH Nucleus Area 
contains the control module of the program. 
It is 2.3K bytes long and its contents 
remain in storage unchanged. 



MCH Transient Area ; The MCH Transient Area 
occupies IK bytes of main storage adjacent 
to the MCH Nucleus Area. It is used by the 
MCH transient Cor overlay) routines, which 
reside on SYS1.SVCLIB. 



MCH Common Area : The MCH Common Area is 
used for intermodule communication and con- 
struction of the MCH error record. A por- 
tion of the MCH Common Area, called the 
Subsystem Data Area, is used for communi cap- 
tion between MCH and any subsystems which 
may be running under the operating system. 
The MCH Common Area is partitioned into 
several smaller data areas. The contents 
of these partitions are described in Sec- 
tion 4. 



The fixed and extended logout areas are 
reserved to log the data about the machine 
malfunction. The fixed logout area pri- 
marily contains information that is model 
independent. It extends from location 176 
to location 511 (decimal). MCH only uses 
that portion from 232 through 511. The 
extended logout area contains only model- 
dependent data and is 192 bytes long for 
the Model 145 and 14 bytes long for the 
Model 135. 



MCH Nucleus Area 



MCH Transient Area 
(1024 bytes) 



Model Dependent Common Area 
(8 bytes) 



Model Independent Common Area 
(950 bytes)* 



Record Buffer Build Area 
(80 bytes) 



Fixed Logout 
(280 bytes) 



Extended Logout** 
(192 bytes) 



Damage Assessment Field Buffer 
(74 bytes) 



Subsystem Data Area*** 
(64 bytes) 



MCH TTRs and successor list 
(75 bytes) 



♦Includes 216 bytes of ABRECs. 
**For the Model 135, the Extended 
Logout is only 14 bytes long and is 
contained within the Fixed Logout 
area. 
♦♦♦Model 145 only; does not apply for 
the Model 135 and is not allocated. 



Figure 3. MCH resident area 
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AUXILIARY STORAGE REQUIREMENTS 

The MCH transient modules occupy 13K 
bytes (Model 145) or 10K bytes (Model 135) 
in SYS1.SVCLIB on the primary SYSRES 
device. The MCH Nucleus and the MCH 
Initialization module (used to initialize 
MCH during NIP operations) are allocated 
8.7K bytes on SYS1.LINKLIB. 



The configuration of the system in use 
determines the amount of space required in 
SYS1.L0GREC for MCH to write its error 
records. (See IBM System/360 Operating 
System Storage Estimates , GC28-6551, for 
details . ) Figure 4 shows the Machine-Check 
Handler in main storage and auxiliary 
storage. 
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OVERLAY STRUCTURE OF MCH 

While the MCH Nucleus remains in main 
storage at all times , most other MCH 
modules are in main storage only when they 
are being used and are called transient 
modules. These nonresident modules are 
stored in the SYS1.SVCLIB data set. Figure 
5 illustrates the overlay structure of MCH. 

When the Machine-Check Handler is not 
being used, the Soft Machine- Check Handler 
occupies the transient area. The Soft 
Machine-Check Handler is an MCH module that 
prepares the recovery report for soft 
machine-check interruptions. Having the 
Soft Machine-Check Handler reside in the 
transient area eliminates the need to bring 
in modules from auxiliary storage when a 
soft machine-check interruption occurs. 

The MCH Nucleus, which can be thought of 
as the control program for the Machine- 
Check Handler, resides permanently in the 
MCH Nucleus Area. The Module Loader is 
included in the MCH Nucleus. When a 



machine-check interruption occurs, and the 
Nucleus determines that transient modules 
are needed to continue processing the 
machine- check interruption, control is 
given to the Module Loader to load a tran- 
sient module from SYS1.SVCLIB. The first 
module brought into the transient area then 
overlays the Soft Machine-Check Handler. 
Each transient module can determine which 
transient module will succeed it. When the 
current transient module finishes its pro- 
cess ing # it specifies the logical path 
number of the successor module to the 
Module Loader. The Module Loader then 
transfers control to the I/O Supervisor, 
which reads the next module into the tran- 
sient area. After all processing has been 
completed, the Soft Machine-Check Handler 
is read back into the transient area. 
Except for system termination, the Soft 
Machine-Check Handler is always the final 
successor module, since it must be resident 
when MCH is again given control. When the 
system must be terminated, the Emergency 
Recorder is the last module in the Tran- 
sient Area. 



PRIMARY SYSRES DEVICE 



MAIN STORAGE 



MCH NUCLEUS AREA 



MCH TRANSIENT AREA 



MCH COMMON AREA 



IPL 



MCH 

TRANSIENT 

MODULES 



MCH ERROR 
RECORDS 



SYS1.LINKLIB 



SYS 1. SVC LIB 



SYS1.LOGREC 



Figure 4. Main storage and auxiliary storage relationships 




SYSRES 



Figure 5. MCH overlay structure 
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SECTION 2: METHOD OF OPERATION 



This section describes the functions of 
the Machine-Check Handler, For the reader 
who is unfamiliar with the operation of the 
program, this section will serve as an 
introduction to the logic described in the 
"Program Organization" section of the manu- 
al. For the reader familiar with MCH, this 
section, especially the illustrations, can 
be used for review. 

This section is divided into two parts: 
the first tells why MCH operates as it 
does, and the second shows the operations 
that take place. 



COMMUNICATIONS 

To deal effectively with each machine- 
check interruption, MCH must have certain 
data concerning the nature of the malfunc- 
tion. The hardware produces a logout that 
gives the Machine-Check Handler the infor- 
mation it needs to properly analyze the 
error. The Machine-Check Handler moves 
this information into the MCH Common Area. 
The transient modules use the Common Area 
to communicate with each other. The MCH 
Nucleus also uses the Common Area; it 
stores and retrieves data about the 
attempted recovery. Section 4 of this 
manual describes the Common Area fully. 



THE LOGIC OF MCH 

MCH has two basic methods of operation: 
one for hard machine-check interruptions 
and one for soft (see Figures 6, 7, and 8). 
In processing a haxd machine- check inter- 
ruption, the Machine-Check Handler goes 
through four stages of operation: 

1. Initialization 

2. Hardware error analysis 

3. Program damage recovery 

4. Recording and termination 

For a soft machine- check interruption, step 
3, program damage recovery, is omitted. 
Since, by definition, a soft machine- check 
interruption signifies that the error has 
already been corrected by the circuitry 
(CPU retry and ECC) , program damage recov- 
ery is not necessary. Figures 6, 7, and 8 
illustrate the general processing involved 
in handling each type of machine-check 
interruption . 

In addition to the four steps mentioned 
above, the Ma chine- Check Handler controls 
whether the machine will operate in record- 
ing mode or quiet mode. This function is 
logically independent from normal MCH 
processing. 

The machine must communicate with the 
Machine-Check Handler and the components of 
the Machine-Check Handler must communicate 
with each other. In addition, MCH extern- 
ally communicates with the operator and 
system maintenance personnel. 



INITIALIZATION 

MCH normally receives control through a 
machine-check interruption. Should another 
machine-check interruption occur while MCH 
has control, processing would stop and con- 
trol would go back to the beginning of MCH. 
As a result, the first machine- check inter- 
ruption would never be processed since the 
information about it in the logout area 
would be lost. To minimize this possibili- 
ty, MCH receives control with the system 
disabled for further interruptions. Dis- 
abling, however, is only a temporary mea- 
sure to give the MCH Nucleus time to make 
some emergency provisions. The following 
initializing steps are taken by the MCH 
Nucleus : 

1. It disables soft machine-check inter- 
ruptions. Since soft errors have 
already been corrected, priority to 
interrupt MCH processing is given to 
hard errors. If the error being 
handled is hard and MCH is attempting 
to recover, there is no need to inter- 
rupt processing to report a soft 
error. If the present error is soft, 
there is no reason for one soft error 
to have priority over another. 

2. It saves the contents of the fixed 
logout area in the MCH Common Area. 
If a hard machine-check interruption 
occurs now, the original data will not 
be overlaid by data from the second 
error. Also, extended logouts are 
prevented via a mask setting in con- 
trol register 14 so that the extended 
logout area will not be overlaid. 
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Figure 6. General processing of hard errors 
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Figure 7. General processing of Model 145 soft errors 



12 




INTERNAL RECOVERY (HARDWARE ACTION) 



WW&ISS!*^^ 



t^-sh 




NO 



HR 



NO 



NO MACHINE-CHECK 
INTERRUPTION 



IGFMCH50 




Figure 8. General processing of Model 135 soft errors 



Section 2: Method of Operation 13 



3. It saves the machine check old PSW. 
Then if a second error should occur, 
causing the current PSW to replace the 
old, control can be given back 
(through an LPSW instruction) to the 
program that was interrupted first 
(provided the error was corrected with 
the original system status intact) . 

4. It alters the address in the machine 
check new PSW to point to the SHUT 
(Special Handler for Unusual Termina- 
tion) routine. A second machine-check 
interruption now sends control to the 
SHUT routine , rather than to the 
beginning of MCH. Note that a second 
machine-check interruption implies an 
error within MCH or IOS. If the 
machine- check new PSW were never 
altered and the error recurred, the 
Machine-Check Handler would go into a 
loop. Also, since the second error is 
within MCH, it is recognized that MCH 
is operating in a degraded state and 
might not be able to recover from the 
original error. 

5. It alters the address in the program 
check new PSW to point to a special 
program check handler that intercepts 
and recognizes all program- check 
interruptions. If the interruption is 
caused by a Monitor Call, it is 
ignored. If the interruption is not 
caused by a Monitor Call, control is 
passed to the SHUT routine. 

6. It enables hard machine- check inter- 
ruptions. Soft machine- check inter- 
ruptions remain disabled until error 
recording is completed. 

There is always the danger that a 
machine malfunction may occur immediately 
after MCH is entered and the system is dis- 
abled for interruptions. If this happens. 



the machine comes to a hard stop, no 
instructions are executed and no interrup- 
tions occur. The machine can only be 
removed from the hard stop by a system 
reset or IPL. 

Figure 9 shows MCH responses to various 
error-on-error conditions. 

Saving the Environment 

The Machine-Check Handler saves the 
fixed logout and the machine check old PSW 
to protect them from a second machine-check 
interruption. The address where the fixed 
logout is saved is contained in MCHINLOG in 
the Common Area. Once the system has been 
reenabled for interruptions, MCH saves the 
permanent storage assignment (PSA) . The 
PSA extends from location through loca- 
tion 128 (decimal) . The four data areas 
are saved in the following locations: 

• Fixed Logout - address contained in 
MCHINLOG in the MCH Common Area 

• Extended Logout of Model 135 - a fixed 
save area (decimal location 256) within 
the Fixed Logout 

• Extended Logout of Model 145 - address 
contained in control register 15 

• Machine-Check Old PSW - MCHRPSW 
(REMCOPSW) in the Common Area 

• Permanent Storage Assignment - MCHPSA 
in the MCH Common Area 

Figures 10 and 11 illustrate the MCH 
environment before and after initialization* 

Module Loading 

The Machine-Check Handler uses the faci- 
lities of the I/O Supervisor to bring the 
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Figure 10. MCH and environment before initialization 
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MCH transient modules into main storage. 
Since an I/O interruption takes place after 
the new module has been read into the MCH 
transient area, MCH saves the address por- 
tion of the I/O new PSW and replaces it 
with the address of a section of its own 
code. This permits the Machine-Check 
Handler to service the I/O interruption. 
The original address contained in the I/O 
new PSW is replaced prior to returning to 
the system. 

Figure 12 shows the module loading 
operation and explains the logic of schedu- 
ling a successor module. 



HARDWARE ERROR ANALYSIS 

To accurately assess the extent of the 
damage at the time of the machine- check 
interruption, the MCH Nucleus and the Pre- 
liminary Error Analysis (PEA) modules ana- 
lyze the hardware error. MCH must identify 
the type of error, where it occurred, and 
under what special circumstances, if any. 

To understand the hardware analysis 
function, some of the major fields of the 
machine- check interruption code are dis- 
cussed first. Appendix B describes the 
interruption code fully. 



MACHINE CHECK SUBCLASSES : Bits through 
4, the subclasses, indicate the machine 
check condition causing the interruption. 
On each interruption, at least one of these 
bits must be set. If multiple errors have 
occurred, several bits may be on. 



TENSE : This field indicates the timeliness 
of the interruption status. For example, 
bit 14, when one, indicates that the 
instruction address in the machine check 
old PSW points to the instruction in which 
the error occurred. If the bit were set to 
zero, the instruction address would be 
pointing to an instruction beyond the point 
of error. 



STORAGE ERRORS : This field informs MCH 
that the error was in main storage. 

VALIDITY : The validity bits represent the 
various fields stored during the machine- 
check interruption. Any bit that is zero 
indicates that the associated data (general 
registers, condition code, etc.) has been 
affected by the error. 

EXTENDED LOGOUT LENGTH : (Model 145 only) 
This field indicates the length in bytes of 
the extended logout area pointed to by con- 
trol register 15. 



Types of Hardware Malfunctions 

The following types of hardware failures 
can be identified by the MCH Nucleus from 
the machine- check interruption code: 

• System Damage - An error occurred that 
could not be attributed to the instruc- 
tion referred to by the machine-check 
old PSW. 

• Instruction Processing Damage - An 
error- occurred during the processing of 
the instruction indicated by the 
machine- check old PSW. The instruction 
was either unretryable or unsuccessful- 
ly retried, or the damage resulted from 
a multiple bit failure in main storage 
or a Storage Protect Feature (SPF) 
error . 

• CPU Retry Successful - (Soft error) The 
CPU instruction was successfully 
retried. 

• ECC Successful - (Soft error) A single 
bit storage error was corrected by the 
ECC facility. 

• Time of Day Clock Damage - An error 
occurred in the time of day clock mak- 
ing it invalid for time stamping. 

• Timer Damage - The high- resolution 
timer at location 80 contains a parity 
error. 

System Damage 

System damage occurs when the machine 
circuitry or the microcode in the CPU has 
failed. Multiple-bit errors in control 
storage are included in this category. By 
presenting the malfunction as system damage 
in the machine- check interruption code, the 
machine informs MCH that system operation 
must stop. In this case the MCH Nucleus 
places the system in the wait state. 

Instruction Processing Damage 

Any type of instruction processing 
damage has some program damage or potential 
program damage associated with it. MCH 
must therefore ultimately associate the 
error with a system or user task and then 
take whatever action is necessary to keep 
the system operating. The first step is to 
determine from the bit settings in the 
interruption code the type of error that 
occurred. The common bit settings for 
various types of machine malfunctions are 
shown in Appendix B. 

There are three types of malfunctions 
that are classified as instruction proces- 
sing damage. Bit one, the instruction- 
processing-damage bit, is on (set to 1) in 
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FINDING SUCCESSOR MODULE 



LOADING SUCCESSOR MODULE 



Procedure 

1. Module Loader saves displacement ta successor 
list for module just loaded. 

2. Transient module places a code into MCHNXMOD 
to designate its successor. 

3. Module Loader adds saved displacement to MCHNXMOD 
and subtracts 1 to determine successor. 



Example 

1. If PEA (IGFMCH41) is loaded, the Module 
Loader saves X'lB' from its Displacement Table. 

2. If PEA wants control passed to the Soft Machine-Check 
Handler (IGFMCH40), it places a X'02' in MCHNXMOD. 

3. Module Loader adds: 
MCHNXMOD X'02' 
Saved Displacement X'lB' 



and subtracts 1 



X'lD' 

-X'QT 

X'lO 



The result of which points to 40 (IGFMCH40) 
in the successor list. 
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An ID of 00 indicates that a module has specified an invalid successor. 

IGFMCH91 is the TSO Analysis module. Its original name is IKJEAM00 but it 
is linked into the SVC LIB at System Generation time as IGFMCH91. 



Figure 12. Finding and loading MCH modules (Model 145 modules illustrated) 
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all cases. The remaining related bit set- 
tings indicate the type of instruction pro- 
cessing damage that occurred. (Appendix B 
describes the use of each bit in the inter- 
ruption code.) 

1. Retry failed - This condition is indi- 
cated when the instruction processing 
damage bit is on and the error is 
neither a multiple-bit error nor an 
SPF key error. Since the PSW is 
pointing to the failing instruction 
and the instruction address is valid, 
MCH assumes that the CPU has retried 
the instruction but has not been 
successful. 

2. Multiple-bit error in main storage - 
MCH, through the Preliminary Error 
Analysis routine, determines whether 
this type of error is solid or inter- 
mittent by finding the location of the 
error and doing a series of stores and 
fetches using that area. This is 
termed exercising a location. 

If data changes during a store or 
fetch, or if another machine- check 
interruption occurs during the exer- 
cise, MCH labels the error solid. 
Otherwise, MCH labels the error 
intermittent . 

Since a valid machine-check interrup- 
tion must be anticipated each time 
data is fetched from or stored into 
the location, the address in the 
ma chine- check new PSW is altered to 
point to code that services the 
expected interruption. The result of 
this test is placed in the Common 
Area. MCH eventually uses this infor- 
mation to assess the damage to the 
task occupying that particular section 
of main storage. 

3. SPF key error - The severity of an SPF 
key error is determined in a way simi- 
lar to that used for the multiple-bit 
storage error. The machine- check new 
PSW is made to point to a section of 
code that will service the expected 
machine-check interruption. A succes- 
sion of fetches using all 16 possible 
key patterns is made to determine 
whether the error is solid or inter- 
mittent. The result of this analysis 
is placed in the MCH Common Area. 



PROGRAM DAMAGE RECOVERY 

Having identified the hardware charac- 
teristics of the malfunction, MCH investi- 
gates the extent of the damage to the pro- 
gram executing at the time of the machine- 
check interruption. After assessing the 
damage to the program, MCH attempts to 



recover the system by associating the 
damage with a particular task and terminat- 
ing that task. This keeps the system in 
operation at the expense of only one job. 
If the supervisor is damaged, the system 
must be reloaded. 

The modules responsible for system re- 
covery are collectively known as the pro- 
gram damage assessment and repair modules 
or PDAR. To accurately assess the extent 
of the damage to the system, the PDAR 
modules use information placed in the MCH 
Common Area by the MCH Nucleus and Preli- 
minary Error Analysis. In turn, each PDAR 
module uses the Common Area to convey the 
results of its operation to its successor. 

In general, program damage recovery 
assists in the following: 

• Damage assessment - associating what is 
known about the hardware characteris- 
tics of the failure with the task occu- 
pying the location that was affected. 

• Task Termination - terminating any task 
that is in the problem program area. 

• System Termination - putting the system 
in the Wait state when the error occurs 
in the supervisor or Link Pack Area, 
making further system operation 
impossible. 

Program damage recovery procedures are 
necessary for: 

• Intermittent or solid SPF key error 

• Intermittent or solid main storage 
errors 

• Retry failed error 

To recover from an intermittent main 
storage or SPF key error in a problem pro- 
gram, the task is terminated by ABEND. 

To recover from a solid main storage 
error or SPF key error in a problem pro- 
gram, the task is terminated by setting its 
TCB nondispatchable. 

For an uncorrectable error caused by a 
failing instruction in a problem program, 
the task is terminated by ABEND. 

For all errors (SPF key, main storage, 
or failing retry) occurring in the supervi- 
sor or Link Pack Area, MCH places the sys- 
tem in the Wait state. 



RECORDING AND TERMINATION 

The recording function of the Machine- 
Check Handler has two parts. The first is 
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the normal error recording procedure of 
formatting an error record and eventually 
writing it on the SYS1.L0GREC data set. 
The second is emergency recording; that is, 
the recording attempted when MCH has deter- 
mined that system continuation is 
impossible. 



Error Recording 

The typical MCH error record is illus- 
trated in Figure 13 . It consists of the 
MCH abbreviated record (ABREC), the fixed 
logout, the extended logout, and the 
damage-assessment field of the MCH Common 
Area. This record is produced for all 
ma chine- check interruptions. 



Error recording involves formatting a 
record and writing it into the SYS1.L0GREC 
data set. Before MCH terminates its opera- 
tion it formats the error MCH record. The 
actual writing of the record on the data 
set takes place after MCH terminates. MCH 
terminates before writing to decrease the 
chances of a second machine- check interrup- 
tions occurring while MCH is executing. 
In other words, if an interruption takes 
place during the I/O operation and MCH has 
not yet terminated, the interruption would 
have to be handled as an error- on- error 
condition. If MCH has terminated and a 
machine-check interruption occurs, MCH can 
handle the interruption in the normal 
manner. 

MCH does the following to put an error 
record in SYS1.L0GREC: 

1. It formats the complete error record. 

2. It establishes the communications task 
(MCH Error Recorder) as an active 
task. 

3. It terminates itself by giving control 
to the Dispatcher or to the inter- 
rupted program. 

If the dispatcher gets control it will dis- 
patch the next ready task. This should be 
the communications task. The MCH Error 
Recorder then writes the MCH records into 
the SYS1.L0GREC data set. 

Should another machine-check interrup- 
tion occur before the error record is writ- 
ten, the Error Recorder writes the short 
record of the first interruption and the 
complete error record of the second inter- 
ruption. If a third interruption occurs 
after the record is formatted but before it 
is written, the short record of the first 
and second interruptions and the complete 
record of the third interruption are 
recorded. 



The maximum number of error records that 
can be formatted for each recording opera- 
tion is three. Therefore, if more than 
three interruptions occur before any re- 
cording is done, the record for the current 
interruption replaces the most recent soft 
error record. If no soft error records 
have been formatted, the most recent hard 
error record is replaced. Consequently, 
when the Error Recorder finally puts the 
error records into SYS1.L0GREC, they repre- 
sent the three most recent machine checks 
with the hard machine checks taking priori- 
ty. Also, the number of lost records and 
their characteristics are included in the 
error record. 

The system remains in quiet mode until 
formatted error records have been written. 
Therefore, soft errors cannot cause for- 
matted error records to be overlaid. 

Note : The MCHDAMAG field of the MCH error 
record reflects the error analysis and 
action taken by MCH in its processing. The 
field is designed to be model independent 
in content. Thus, for a specific model all 
bits in the damage area or error type bytes 
of the field may not be implemented. Spe- 
cifically, the buffer, control storage, 
extended main storage, address and mark 
bits are not set for all machines. 

In addition, some RMS action data bits 
in this field are not set for all machines. 
The retry bit is set only if some type of 
software retry is attempted by MCH. Repair 
indicates that MCH has attempted to repair 
an SPF key failure. The Reconfigure bit 
indicates that MCH has performed some type 
of main storage reconfiguration. The 
refresh bit is set only if MCH has 
refreshed a portion of main storage. The 
setting of any of these bits indicates that 
MCH has performed the indicated action but 
does not imply that MCH was able to resume 
the task that was in control at the time of 
the error. For instance, task or system 
termination may be necessary if the retry 
was unsuccessful, a valid return point to 
the interrupted program is not available or 
an instruction is non- retry able. 

Finally, for certain machine- check 
interruptions MCH makes an early determina- 
tion that system termination is necessary 
and thus does not perform any further ana- 
lysis as to type of error or area of 
damage. In these cases only the termina- 
tion bit is set and possibly a system down 
code in the RMS action data area of the 
field. 

Emergency Recording 

Emergency recording is necessary when 
the system cannot continue to operate. 
Instead of giving control to the operating 



20 



r T . 

Offset DECIMAL (HEX) 

J. T ^ Field 

Model 135 | Model 145 | Name Description 

Header 



16 (10) 




1 



(0) 
(1) 

(2) 

(3) 



(4) 

(8) 



24 



H- 



32 



(18) 
(20) 



I— 



40 



(28) 



H- 



48 



(30) 
+ 



256 (100) 




1 



4 
8 
16 



(0) 
(1) 

(2) 

(3) 



(4) 
(8) 

(10) 



24 
32 
40 
48 



(18) 
(20) 
(28) 
(30) 



Bytes 17-24 
f 

Program ID 
8 bytes from MCHABREC in the MCH Common Area, 



328 (148) 



24 bytes from MCHABREC in the MCH Common Area. 

Byte 1 Record ID. 

Byte 2 

xx System ID. OS=00. 

. . x Not used . 

...x xxxx Release level. 

Byte 3 

1 Operator action message. 

.1 System/370 machine. 

..xx xxxx Not used. 

Byte 4 Record type information: 

1 Short form of record. 

.1 Record incomplete. 

..1 MCH terminated the system. 

...x xxxx Not used. 

Bytes 5-8 Not used. 

Bytes 9-16 Date and time. 

CPU Serial number. 



+- 



Job ID 
8 bytes from MCHABREC in the MCH Common area. 



+- 



MC Old PSW 
8 bytes from MCHABREC in the MCH Common Area. 



MC Independent Logout 
280 bytes from the Fixed Logout Save Area. 



Extended Logout of Model 135 
14 bytes contained within a scratch area of the Fixed Logout. 

Extended Logout of Model 145 
192 bytes from the Extended Logout field (pointed to by con- 
trol register 15) . 

♦ Note : Because of the difference in Extended Logouts for the 
Models 135 and 145 , MCH error record displacements differ 
for the two models from this point on. 

Damage Assessment Data 
74 bytes. 

Bytes 1-6 from MCHPDAR in the MCH Common Area. 
328 (148) j 520 (208) | Bytes 1-2 Length of this field. 
330 (14A) | 522 (20A)| Bytes 3-6 Address of the Machine Dependent Common Area. 

Bytes 7-10 from MCHLOGIC in the MCH Common Area. 
334 (14E) | 526 (20E)| Bytes 7-10 First level interrupt control field. 

Bytes 11-18 from MCHDAMAG in the MCH Common Area. 

L J X 

Figure 13. MCH error record (part 1 of 2) 
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Offset DECIMAL (HEX) 
J T 1 

Model 135 | Model 145 
+ + 



Field 
Name 



Description 



338 (152) 



339 (153) 



340 (154) 



341 (155) 



342 (156) 



343 (157) 



346 (15A) 



352 (160) 



354 (162) 



530 (212) 



531 (213) 



532 (214) 



533 (215) 



534 (216) 



535 (217) 



538 (21A) 



544 (220) 



546 (222) 



554 (22A) 



Damage Assessment Data (continued) 

Byte 11 System status. 

1 Hardware recovery. 

1.. . ».. Software recovery. 

.1 Task aborted. 

..1 .... Task set nondispatchable • 

... 1... Termination. 

... .xxx Not used. 



Byte 12 

l! !. 

.1 .. 
.. 1. 
.. .1 



Damage Area. 
Main storage. 
Buffer. 

Control storage. 
Extended main storage. 
Processor. 
Channel error. 
Time-of-day clock error. 
Not used. 



Byte 13 Error type. 

1 Intermittent . 

1 Solid. 

1 Data. 

.1 .... Address 

.. 1... Mask. 

.. .1.. Protect. 

• • • .xx Not used. 



Byte 14 



.1.. 




..1. 


• • • • 


...1 


• • • • 


. • . . 


1... 


• • • • 


.001 


• • • • 


.010 


• • • • 


.011 


.... 


.100 



362 (16A) 
Figure 13. MCH error record (part 2 of 2) 



RMS Action data. 
Retry. 
Repair. 
Reconfigure. 
Refresh. 

Machine-check interruption in MCH. 
System damage. 

Machine-check interruption in CCH. 
I/O error. 
Invalid Logout. 

Byte 15 Machine status data. 

1 HIR in record mode. 

. 1 ECC in record mode. 

..1 Buffer disabled. 

...x xxxx Not used. 

Bytes 16-18 Not used. 

Bytes 19-26 from MCHLSUM in the MCH Common Area. 

Bytes 19-24 Lost record summary. This field represents 
the number of lost records and their 
char acteri sties . 

Bytes 25-26 Not used. 

Bytes 27-34 from MCHHISTY in the MCH Common Area. 

Bytes 27-34 History of executed transient modules. 

Bytes 35-74 from MCHPDAR in the MCH Common Area. 

Bytes 35-74 PDAR data. 
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system to write the error record — since 
the system is known to be unreliable — MCH 
writes it. The Emergency Recorder, an MCH 
transient module , determines the number of 
records formatted in the buffer and whether 
there is room in SYS1.LOGREC to record 
them. The writing is done by the Module 
Loader in the MCH Nucleus. When the error 
records have been placed in SYS1.LOGREC, 
control is given to the SHUT routine to 
write a message to the operator informing 
him of the status of the error and to ter- 
minate the system. 

Interface with the Channel-Check Handler 
(CCH) 

The Channel-Check Handler (CCH) is a 
resident program which receives control 
from the I/O Supervisor after detection of 
a channel data check, channel control 
check, or interface control check. 

The Machine-Check Handler may receive 
control during CCH operations because: 

• A machine-check interruption occurs 
during CCH processing, or 

• CCH determines that the operating sys- 
tem must be terminated. 

When CCH is entered, it places a code 
X'Ol' in bits 24-31 of the machine-check 



new PSW. This indicates to MCH, when a 
machine- check interruption occurs, that CCH 
is the affected program and that the system 
must be terminated. Only a machine-check 
error record is written in this case (see 
Figure 13) . 



When CCH determines that the system must 
be terminated because of a channel error, 
it: 

1. Constructs a full channel-check record 
entry (see Figure 14). 

2. Puts a code of X'OF 1 in bits 24-31 of 
the machine- check new PSW to indicate 
that CCH has created a record to be 
written and that the system must be 
terminated because of a channel error. 

3. Loads the machine- check new PSW to 
pass control to MCH. 

MCH uses error recording procedures to 
write the channel record on SYS1.L0GREC. 
The address of the channel record is in 
register 13 and its length is in the count 
field of the header (see Figure 14). 

After MCH writes the record, it writes a 
console message that a channel error record 
has been recorded and it places the system 
in a Wait state with a code of AOA. 
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Dec Hex 


8 8 

16 10 

24 18 

32 20 

40 28 



Count Field 8 



t- 

I 
Switches 2 | 

I 



Key 2 



Reserved 4 



Date 4 



Time 4 



CPU ID and Serial Number 8 



Job ID 8 



56 38 



64 40 



72 48 



80 58 



88 58 



Active I/O Units 16 



Failing CCW 8 



^ 



CSW 8 



Extended Channel Status Word 4 j Device Type 4 
I 

T 

Channel ID| Control Unit Address 3 

I I 



I 

Multiprocessing Information 4 

I 
„± x 1 



Channel Log 
(variable length) 



Figure 14. CCH error record 
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OPERATION DIAGRAMS 

The first set of operation diagrams (Figures 15 through 17) traces 
the general path for processing each type of error. The major decisions 
along the way are related to the module that makes that decision. 

The second set of operation diagrams (Figures 18 through 21 ) is more 
detailed. Each diagram is a series of frames that show the progress of 
an operation , such as initialization. Comments under each frame indic- 
ate what is being depicted. 

The frames are numbered sequentially. When a decision results in a 
break from the normal sequence 9 an off -page connector composed of the 
figure number, figure part* and frame number directs the reader to the 
next frame. For example, a jump to Figure 20 f part 3 f frame 15 is shown 
as F20, 3-15. When a decision results in a break from the normal 
sequence to another frame on the same page, only the frame number is 
given inside the connector. For more detail, the flowcharts or the 
microfiche should be consulted. 
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Figure 15. Main storage error, soft error, system damage, and CCH error 
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Figure 16. Flow of control for SPF key error 
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R = Resident 

T = Transient 

C = Common to MR and MVT 

M = Machine Dependent 

S = System Dependent 

* = Loading and Transfer of Control 

done through MCH Nucleus 

Routines 
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Figure 17. Flow of control for CPU error 
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Figure 18. Initialization (Part 1 of 2) 
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At this point in the program's 
processing, all the critical fields 
have been saved and the system 
can be again enabled for hard 
machine check interruptions. 
Control register 14 remains 
masked for soft errors and 
extended logouts. 
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Figure 18. Initialization (Part 2 of 2) 
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Figure 19. Hardware error analysis (Part 1 of 5) 
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PEA determines the type 
of Instruction Processing 
Damage from the Machine 
Check Interruption Code 
(See Figure 35 for an 
illustration of the bit 
setting for type of 
Instruction Processing 
Damage). 
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Figure 19. Hardware error analysis (Part 2 of 5) 
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Main storage is analyzed to 
determine if the error is 
solid or intermittent. This 
is done by moving a series 
of bit patterns in and out of 
the failing location. 
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Figure 19. Hardware error analysis (Part 3 of 5) 
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Before proceeding to the Program 
Damage Recovery function, 
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From Figure 19 
Part 2 
Block 11. 
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Figure 19. Hardware error analysis (Part 4 of 5) 
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When the particular main storage 
error or SPF key error has been 
classified according to type, or if 
the error was due to an unretryable 
instruction or a "retry failed" 
condition, the Program Damage 
Recovery function continues 
the processing. 



Figure 19. Hardware error analysis (Part 5 of 5, 
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Figure 20. Program damage recovery (Part 1 of 4) 
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Figure 20. Program damage recovery (Part 2 of 4) 
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Figure 20. Program damage recovery (Part 3 of 4) 
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Figure 20. Program damage recovery (Part 4 of 4) 
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Figure 21. Recording and termination (Part 1 of 6) 






Determined from Machine- 
Check interruption code 
(See Figure 35 for 
illustration of bit settings.) 



MODEL 145 

For: 

CPU Retry Successful Errors, 

ECC Corrected Main Storage 
Errors, 

ECC Corrected Errors 
in Control Storage, when 
Control Storage is in 
Recording Mode 



And Timer Errors 



-+\ F21 

13-13J 



MODEL 145 

For ECC Corrected Errors in 
Control Storage, when Control 
Storage is not in Recording Mode: 
Schedule message indicating 
mode switch from threshold 
to quiet. 

Note: For ECC Corrected errors 
in Control Storage, the Model 
145 checks (when control storage 
is in threshold mode) whether 
the specified error threshold has 
been exceeded. If it has, the 
Model 145 switches from threshold 
to quiet mode and generates 
an MCI. 
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MODEL 135 



For ECC-successful errors: 

1) Enter ECC Quiet Mode 

2) Schedule message IGF055I 



For Timer errors: 
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MODEL 135 



For CPU (HIR) retry 
successful errors: 

1) Add 1 to an error count 

2) If error count = 10, 

a) Enter ECC and HIR 
Quiet Mode 

b) Schedule message 
IGF055I 
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When Control Storage is in 
quiet mode, MCH is not 
entered for a soft MCI. 
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Figure 21. Recording and termination (Part 2 of 6) 
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Otherwise: 

exit is made to the Dispatcher and the 
Dispatcher gives control to the next "ready" 
task which should be the Communications 
task. The Error Recorder puts the MCH 
Error Record into the SYS1.LOGREC 
data set. 
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SECTION 3s PROGRAM ORGANIZATION 



This section contains descriptions and 
flowcharts for MCH modules. Each module 
description contains: 

English name of the module 

Module ID 

Functions 

Operation 

Note ; A module's ID is the same as its 
chart ID; the module ID corresponds to the 
module ID found in the microfiche listing. 
In addition, the charts are in the same 
sequence as the module descriptions. 



MCH INITIALIZATION 

Module ID : IGFMCHFO 

Functions : MCH Initialization completes 
the initialization process started by the 
System Nucleus Initialization Program 
(NIP). Summarized here is that NIP Initia- 
lization of MCH. 

Preliminary initialization of MCH during 
NIP processing insures that: 

• MCH is incorporated into the operating 
system nucleus 

• MCH is initialized with the values and 
addresses it needs for processing 
interruptions 

Before passing control to IGFMCHFO, NIP: 

• Checks whether the Machine-Check Handl- 
er is for the correct machine; if not, 
it issues a message to the operator 
informing him that MCH is inoperative. 

• Loads the MCH Nucleus (IGFMCHEO) from 
the SYS1.LINKLIB into the dynamic area 
and passes control to it. One of the 
parameters NIP passes to the MCH Nuc- 
leus is a pointer to the first location 
in lower main storage which is not a 
part of the operating system nucleus. 
The MCH Nucleus relocates itself to 
that address, making itself contiguous 
with the operating system nucleus. MCH 
also updates the pointer to point to 
the new end of the operating system 
nucleus (including the MCH Nucleus). 

The MCH Nucleus returns control to NIP, 
which deletes the copy of the MCH Nucleus 



in the dynamic area, loads the MCH initia- 
lization module (IGFMCHFO) from the SYS1. 
LINKLIB, and passes control to it. 



Operation : IGFMCHFO initializes MCH in two 
stages • 

During Stage 1 of its processing, 
IGFMCHFO: 

1. Allocates space for the MCH Transient 
Area. Space is allocated by adding 
the number of bytes needed for the 
transient area to the address of the 
end of the operating system nucleus. 
For the Models 135 and 145, IK bytes 
are needed. IGFMCHFO then loads the 
appropriate Soft Machine-Check Handler 
into the transient area. 

2. Allocates space for, and initializes, 
the Model-Dependent Common Area. 

3. Initializes the Machine Status Block 
in the operating system nucleus with 
Multiple Console Support (MCS) control 
information for the nucleus. 

4. Allocates space for, and initializes, 
the Model- Independent Common Area. 
This area serves as the MCH Communica- 
tions area and is IK bytes long. 

5. Allocates space for the fixed logout 
save area. The fixed logout save area 
is 280 bytes long. 

6. Allocates space for the Extended 
Logout. For the Model 145, the 
Extended Logout is 192 bytes long and 
is pointed to by control register 15. 
For the Model 135, the Extended Logout 
is 14 bytes long and is located within 
a scratch area of the Fixed Logout (at 
displacement 256, decimal) . 

7. Allocates a subsystem data area of 64 
bytes if there is a subsystem present. 

8. Initializes control register 14 with 
the machine-check mask. 

9. Initializes pointers in the MCH 
Nucleus. 

10. Initializes MCH fields in the Dis- 
patcher. A pointer to the Post ECB 
routine in the MCH Nucleus is stored 
in the Dispatcher. 

11. Initializes the machine-check new PSW. 
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12. Initializes fields for the Module 

Scheduler. The IDs and TTRs of the 
MCH transient modules on SYS1.SVCLIB 
are placed in the MCHTTRS field in the 
MCH Model-Independent Common Area. 
Successor IDs for those modules having 
successors are placed in the MCHNXIDS 
field. 

After initializing the Module Scheduler, 
IGFMCHFO returns control to NIP, which 
saves the end-of -nucleus pointer and 
deletes the copy of IGFMCHFO. Next, NIP 
constructs the System Queue Area (SQA) , 
sets up the Link Pack Area, and relocates 
itself to just above the SQA. NIP then 
loads IGFMCHFO again and passes control to 
it. 

During Stage 2 of its processing, 
IGFMCHFO initializes pointers in the MCH 
Common Area for the SVC and LINKLIB BLDL 
tables and then returns control to NIP, 
which deletes the copy of IGFMCHFO that was 
just loaded. 



MCH NUCLEUS 

Module ID : IGFMCHEO 

Functions ; The Nucleus: 

1. Initializes the working environment of 
the Machine-Check Handler. 

2. Screens out machine-check interrup- 
tions that occur during operation of 
the Channel- Check Handler. 

3. Analyzes the cause of the malfunction 
and chooses the successor module. 



2. The machine-check old PSW, which is 
saved at MCHRPSW in the Common Area. 

This data is saved immediately upon 
entry to the MCH Nucleus to avoid the 
destruction of data if a second machine 
check occurs. Aft&r the data is stored, 
the address in the machine- check new PSW is 
changed to point to the SHUT routine 
(IGFERRO) and the address in the program- 
check new PSW is changed to point to a spe- 
cial program-check handler routine (PRO- 
GCHEK) . The system is then reenabled for 
hard machine-check interruptions by setting 
the PSW bit 13 to 1. The Program Status 
Area (locations 0-127 decimal) is saved at 
MCHPSA in the Independent Common Area. 

After the above initialization process 
has been completed, the MCH Nucleus screens 
out any machine-check interruptions ori- 
ginating in the Channel-Check Handler, pas- 
sing control to the Soft Machine-Check 
Handler to initiate the recording of the 
Channel Check Record for those 
interruptions . 

The MCH Nucleus then examines the 
machine- check interruption code in the 
fixed logout to determine the successor 
module. After the type of error condition 
is identified, the Module Loader portion of 
the MCH Nucleus is given control. 

The Special Handler of Unusual Termina- 
tions (SHUT) subroutine is located in the 
MCH Nucleus and entered at IGFERRO from the 
Emergency Recorder or from Preliminary 
Error Analysis when the operating system is 
to be placed in a wait state. SHUT writes 
a message containing the proper wait state 
code and loads a wait state PSW. 



4. Handles unexpected machine-check 
interruptions during MCH processing. 

5. Intercepts Monitor Call interruptions 
and effectively No-op* s them. 

6. Handles module loading using the 
Input/Output Supervisor (IOS). 

7. Handles system termination when 
required. 

Operation : The MCH Nucleus receives con- 
trol through the machine- check new PSW with 
the system disabled for all interruptions. 
It immediately masks out extended logouts 
and soft machine-check interruptions (the 
masking out is done in control register 14) 
and saves the data which is critical to the 
analysis of the error. This data includes: 

1. The fixed logout (found in locations 
232-511 decimal), which is saved imme- 
diately following the Record Buffer 
Build Area. 



MCH MODULE LOADER 

The Module Loader portion of the MCH 
Nucleus calls the I/O Supervisor to load 
the MCH transient modules. The loading is 
done by three subroutines, always resident 
within the MCH Nucleus: 

• The module scheduling subroutine 

• The I/O initialization subroutine 

• The module loading subroutine 

These subroutines, together with the ser- 
vices of the I/O Supervisor, handle all I/O 
for the Machine-Check Handler except re- 
cording errors and communications with 
operator. 

Function of the Module Scheduler : The 
Module Scheduler interfaces between modules 
in the MCH Transient Area when they specify 
a successor module. The Module Scheduler 
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also maintains a history table of module 
execution . This history table (see Section 
5) becomes part of the MCH error record 
when it is a full record. 



Operation of the Module Scheduler ; The 
Module Scheduler uses two tables to sched- 
ule a successor module: 



• A successor table (pointed to by 
MCHNXIDS in the Independent Common 
Area) that contains the IDs of succes- 
sor modules for each module that may 
call a successor. An ID is a one- byte 
field containing the last two digits of 
the module ID; for example, the XX of 
IGFMCHXX (see Figure 12). 

• A displacement table (pointed to by 
MCHTTRS in the Independent Common Area) 
that contains the ID and relative track 
and record address (in TTR format) of 
each transient module as well as the 
displacement into the successor table 
to locate the successors for that 
module. 

A module designates its successor by 
storing a one-byte path number in MCHNXMOD 
(in the Independent Common Area) before 
relinquishing control. The Module Schedul- 
er then uses this path number plus a pre- 
viously saved displacement value (see 
below) to index into the successor table , 
obtaining the ID of the desired successor 
module. 

The Module Scheduler then searches the 
MCHTTRS table for a matching ID. When the 
ID is found, the corresponding TTR is 
obtained along with the displacement value 
for the new module's successor list. The 
TTR is saved for later conversion to an 
absolute track and record address (in 
MBBCCHHR format) . The displacement is 
saved for use during the next execution of 
the Module Scheduler. 

The Module Scheduler then: 

1. Saves the TTR of the module to be 
loaded in the MCHTTRIN field of the 
Independent Common Area. 

2. Prepares the channel program to load 
the module, placing the channel pro- 
gram address in an IOB (labeled 
MCHIOB) . 

3. Loads the address of the SYS1.SVCLIB 
Data Extent Block (DEB) into register 
1. 

4. Passes control to IGFLOAD in the Nuc- 
leus to complete preparations for the 
I/O operation. 



Upon return of control from IGFLOAD, the 
Module Scheduler passes control to the 
module that was just loaded. 

Function of I/O Initialization (IGFLOAD) : 
The I/O Initialization subroutine converts 
the relative device address (TTR format) to 
an absolute address (MBBCCHHR format) . It 
also completes the initialization of those 
I/O control blocks required by the module 
loading routine. 

Operation of I/O Initialization : I/O 
Initialization obtains an absolute device 
address using a system device dependent 
characteristics table (IECZDTAB) and the 
extent information in the DEB whose address 
is in register 1. 

I/O Initialization then passes control 
to the module loading routine at entry 
point IGFIORTN. 

Function of the Module Loader (IGFIORTN) : 
The Module Loader uses a special interface 
with the I/O Supervisor to load the MCH 
transient modules from SYS1.SVCLIB. The 
Module Loader also performs the I/O opera- 
tions for the Emergency Recorder (if the 
system is going into a wait state) . 

Operation of the Module Loader : Routines 
requesting a load operation enter the 
Module Loader after initializing the DCB, 
DEB, IOB, and CCW chain for the module to 
be loaded. 

All registers and the I/O new PSW 
address are saved; the address in the new 
I/O PSW is replaced with the address of the 
MCH First Level Interrupt Handler (MCHFLIH) 
in the MCH Nucleus. The DCB specified in 
the DEB is chained to the MCHIOB. The 
address of an appendage table (APNTBL in 
the MCH Nucleus) is stored in the DEB. 

When control is passed to IOS to Start 
I/O, register 14 contains the address of a 
parameter (MCHIOSWD) and register 1 con- 
tains the address of the IOB for execution. 
The action that IOS is to perform is indi- 
cated in the MCHIOSWD parameter as follows: 

Bit 0=0 Normal MCH entry, honor the 

request 
Bit 0=1 Final MCH entry, dequeue the 

MCH RQE 
Bit 1=1 Clear busy and post indicators 

in the UCB 
Bit 2=1 Internal MCH recursion 

indicator 
Bits 3-31 Reserved 

IOS returns to MCH 4 bytes past the 
beginning of the in-line coded parameter 
MCHIOSWD. MCH reestablishes a base regist- 
er and loads register 2 with the address of 
the UCB. MCH issues the IOSGEN macro 
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instruction to enable the channel it has to 
use and then enters a pseudo wait state (a 
bit-spin loop) . The completion of an MCH 
I/O event (via MCH appendage routines) 
causes exit from the loop. 

If the MCH I/O event does not complete, 
the bit-spin loop eventually times out. 
When this happens r MCH attempts to load the 
Soft Machine-Check Handler to write an 
error record and enter a wait state. 
Should a second time out occur during this 
attempt, MCH immediately enters a wait 
state without writing an error record. 

The IOSGEN macro instruction is then 
issued to turn off the enabled channel, the 
I/O new PSW is stored, and the MCHECB is 
tested to determine the success of the load 
operation. 

If the load operation is successful, the 
Module Loader restores registers and 
returns control to the Module Scheduler via 
register 14. If the I/O operation to per- 
form a load is unsuccessful, the Module 
Loader returns control via register 14 plus 
a displacement of 4. 

The Abnormal End Appendage routine is 
entered upon detection of an unrecoverable 
I/O error by IOS. IOS describes the error 
by posting a code in the MCHIOECB, a field 
of the MCHIOB in the Independent Common 
Area. When that code is X^F', the opera- 
tion in error was successfully retried, and 
the Abnormal End Appendage routine exits to 
IOS via register 14 (with a displacement of 
zero). When that code is X , 44 l , a device- 
end error occurred, and the Abnormal End 
Appendage routine returns to IOS via regis- 
ter 14 with an offset of 8. When that code 
is X , 41 , # a permanent error has occurred. 
In this case, the abnormal end appendage 
routine places the X^l* in the MCHECB (in 
the MCH Nucleus) and returns to IOS via 
register 14 with a displacement of 12, by- 
passing the Post routine and returning the 
Request Queue Element (RQE) to a free list. 

The Normal End Appendage routine sets a 
successful completion code in the MCHECB 
and exits to IOS via register 14 plus a 
displacement of 12. IOS determines whether 
it was entered from the Abnormal or the 
Normal End Appendage routine by the MCHECB 
posting. 



SOFT MACHINE-CHECK HANDLER (MODEL 135 ONLY) 

Module ID : IGFMCH50 

Functions ; The Soft Machine-Check Handler 
(SMCH) : 



2. Determines the type of soft errors and 
performs any mode switching required 
as a result of soft errors. It issues 
a message whenever a mode switch is 
performed because of an error. 

3. Performs normal termination of the 
Machine-Check Handler. 

Operation ; The Soft Machine-Check Handler 
is loaded into the MCH Transient Area dur- 
ing system initialization and may be over- 
laid by subsequent MCH modules loaded for 
the processing of hard machine-check 
interruptions . 

Mode Handling for the Model 135 ; When ECC- 
corrected single- bit storage errors occur 
at a rate of at least 256 errors in 416 
micro- seconds, the Error Frequency Limit 
(EFL) is exceeded and a soft machine-check 
occurs. To process these errors, the Soft 
Machine-Check Handler: 

1. Issues a Diagnose instruction to dis- 
able the EFL function from performing 
additional interruptions. 

2. Sets the ECC Quiet indicator in the 
Machine Status Block. 

3. Indicates storage damage in the MCHDA- 
MAG field of the Common Area. 



Schedules the message: 
MODE ECC. 



IGF055I QUIET 



5. Continues with Record Buffering and 
Formatting (see below). 

Note that no instruction stream can access 
main storage fast enough to exceed the EFL. 
For this reason, only control storage 
errors can generate soft machine- check 
interruptions as the result of exceeding 
the EFL. 

When the twentieth CPU-retry corrected 
error occurs, the Soft Machine-Check 
Handler: 

1. Sets bit 4 of control register 14 to 
disable all soft machine- check inter- 
ruptions (both ECC and CPU retry) . 

2. Sets HIR (CPU retry) and ECC Quiet 
Indicators in the Machine Status 
Block. Whenever CPU retry is set to 
quiet mode, ECC is also set to quiet 
mode since the bit set in control reg- 
ister 14 disables all soft machine- 
check interruptions. 

3. Indicates storage damage in the MCHDA- 
MAG field of the MCH Common Area. 



1. Prepares error records for the SYS1. 
LOGREC data set. 



4. Schedules the message: 
MODE ECC, HIR. 



IGF055I QUIET 
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5. Continues with Record Buffering and 
Formatting (see below) • 

Note that soft machine-check interruptions 
not requiring a mode switch are merely 
handled as described under "Record Buffer- 
ing and Formatting," 

Record Buffering and Formatting : The rec- 
ord buffering routine of the Soft Machine- 
Check Handler (SMCH) scans the record, 
storing the address of the current record 
buffer in the MCHLONG field of the Indepen- 
dent Common Area for use by the record for- 
matting routine. At the end of each record 
buffer, there is a flag byte, which indi- 
cates the status of the record: 

Bit 0=1 Active record 

Bit 1=1 Short record 

Bit 2=1 Full record (Fixed and extended 

logouts are still intact) 
Bit 3=1 The previous record in the 

buffer has been overlaid. 

Since there is room in the MCH record 
buffer for only one record containing the 
fixed and extended logout, a record is 
overlaid if a second, hard machine-check 
interruption is generated before the pre- 
vious record has been recorded. However, 
to prevent the complete loss of records, 
the most critical parts of each record are 
saved, and short records (up to 3) may be 
generated and queued in the short record 
buffer. 

If more than three hard machine checks 
occur before recording, previous short 
error records are overlaid and a counter is 
updated, indicating the number of records 
lost. Figure 22 shows the extended and 
fixed logout areas (the extended logout for 
the Model 135, however, is only 14 bytes 
long and is contained within the fixed 
logout), the short record buffers, the 
order in which they may be overlaid, and 
the location and use of the lost record 
counter. 

The record buffer routine exits to the 
record formatting routine, which completes 
the MCH record by setting up short records 
(ABRECS) . 

Note : See Figure 13 for the format of the 
MCH record. 

MCH Termination : MCH relinquishes control 
at the end of its processing in one of the 
following manners: 

• It returns control to the interrupted 
program if that program was disabled 
for I/O at the time of the interrup- 
tion. Before returning to the inter- 
rupted program, MCH sets a No- operation 
instruction in the Dispatcher, so that 



the error record is posted when the 
interrupted program gives up control. 

It gives control to the nucleus to post 
the error record and upon return it 
exits to the Dispatcher (when normal 
recording is to be done) . 

It gives control to the Emergency Re- 
corder (IGFMCHE3) when continuation of 
system operation is impossible (deter- 
mined by a bit set in MCHDAMAG) . 



SOFT MACHINE-CHECK HANDLER (MODEL 145 ONLY) 

Module ID : IGFMCH40 

Functions : The Soft Machine-Check Handler 
(SMCH) : 

1. Prepares error records for recording 
on the SYS1.LOGREC data set. 

2. Determines the type of soft errors and 
the mode of operation for storage 
errors. It issues a message when the 
hardware has placed control storage in 
the quiet mode as a result of exceed- 
ing the allowable error frequency. 

3. Performs normal termination for the 
Machine-Check Handler. 

Operation : The Soft Machine-Check Handler 
is loaded into the MCH Transient Area dur- 
ing system initialization and may be over- 
laid by subsequent MCH modules loaded for 
the processing of hard machine- check 
interruptions . 

Mode Handling for the Model 145 : When a 
corrected CPU error or an ECC- corrected 
main storage error occurs, IGFMCH40 sets 
appropriate bits in the damage assessment 
field (MCHDAMAG of the Independent Common 
Area) ; puts a message code into the buffer 
(indicating that the CPU retry or ECC suc- 
cessful message is to be issued); and 
passes control to the record buffer manage- 
ment portion of the Soft Machine-Check 
Handler (discussed below). 

For control storage errors g SMCH tests 
for record mode. When in record mode, SMCH 
handles these errors in the same manner as 
a CPU retry or an ECC^ corrected main 
storage error. Otherwise, SMCH schedules a 
message indicating a switch (which occurred 
before the machine-check interruption) from 
threshold to quiet mode. 

Record Buffering and Formatting : The rec- 
ord buffering routine of SMCH scans the 
record, storing the address of the current 
record buffer in the MCHLONG field of the 
Independent Common Area for use by the rec- 
ord formatting routine. At the end of each 
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Location 
X'E8' 



Information 
provided by 
hardware 



MCH moves data 
to Fixed Logout 



Fixed 


.ogout 



















MCH LONG 



MCH LONG points 
to current ABREC 



MCHDAMAG 




MCH moves in data from 
Fixed Logout and other 
locations 



FLAG 



FLAG 



FLAG 



Flag Byte 

Bit 0= 1: Active Record 

Bit 1 = 1: Short Record 

Bit 2 = 1: Full Record 

Bit 3 = 1: Previous Record overlaid 



Lost-record counter keeps track of the 
following: 

1 . Hardware retry 

a. Lost CPU ABEND MCI records 

b. Lost CPU Recovered MCI records 

c. Lost CPU Soft MCI records 

2. ECC 

a. Lost ECC ABEND MCI records 

b. Lost ECC Recovered MCI records 

c. Lost ECC Soft MCI records 



Short Record Buffers (ABREC) 

Short Record buffers may be overlaid in the 
following sequence when all 3 buffers are 
filled: 

1. ABREC for recovered hard MCI 

2. ABREC for unrecovered hard MCI 



Figure 22. Use of buffers and the lost- record counter in recording 



record buffer there is a flag byte f which 
indicates the status of the record: 

Bit 0=1 Active record. 

Bit 1=1 Short record. 

Bit 2=1 Full record (Fixed and extended 
logouts are still intact) . 

Bit 3=1 The previous record in the 
buffer has been overlaid. 

Since there is room in the MCH record 
buffer for only one record containing the 



fixed and extended logout, a record will be 
overlaid if a second interruption is 
generated before the previous record has 
been recorded. However, to prevent the 
complete loss of records, the most critical 
parts of each record are saved, and short 
records (up to 3) may be generated and 
queued in the short record buffer. 

If more than three machine checks occur 
before recording, previous short error 
records are overlaid and a counter is 
updated, indicating the number of records 
lost. Figure 22 shows the extended and 
fixed logout areas, the short record buf- 
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fers, the order in which they may be over- 
laid, and the location and use of the lost- 
record count er. 

The record buffer routine exits to the 
record formatting routine, which completes 
the MCH record by setting up short records 
(ABRECS) . 

Note ; See Figure 13 for the format of the 
MCH record. 

MCH Termination : MCH relinquishes control 
at the end of its processing in one of the 
following manners: 

• It returns control to the interrupted 
program if that program was disabled 
for I/O at the time of the interrup- 
tion. Before returning to the inter- 
rupted program, MCH sets a No-operation 
instruction in the dispatcher, so that 
the error record will be posted when 
the interrupted program gives up 
control . 

• It gives control to the nucleus to post 
the error record and upon return it 
exits to the Dispatcher (when normal 
recording is to be done) . 

• It gives control to the Emergency Re- 
corder (IGFMCHE3) when continuation of 
system operation is impossible (deter- 
mined by a bit set in MCHDAMAG) . 



PRELIMINARY ERROR ANALYSIS 

Module ID : IGFMCH41 

Functions : The Preliminary Error Analysis 
routine examines the machine- check inter- 
ruption code and the fixed and extended 
logouts to determine the recovery strategy 
for MCH. 

Operation : Preliminary Error Analysis 
(PEA) is the first transient MCH module to 
be loaded when the Instruction Processing 
Damage (PD) bit of the ma chine- check inter- 
ruption code is found to be on. (When this 
bit is on, it indicates that an instruction 
or information in a register has been 
changed.) PEA determines whether the 
machine-check interruption code is valid. 
If it is not, PEA sets the "system termina- 
tion necessary" bit in the MCHDAMAG field 
of the Common Area and designates the Soft 
Machine-Check Handler as its successor (the 
Soft Machine- Check Handler passes control 
to IGFMCHE3 to write MCH records, and IGFM- 
CHE3 passes control to SHUT to stop the 
system) . 

If the machine- check interruption code 
is valid, PEA saves the machine-check new 
PSW, replacing it with one that points to a 



section of PEA designed to handle all 
interruptions. PEA then determines whether 
the Instruction Processing Damage (PD) is 
the result of a storage error, an SPF key 
error, or a hardware-retry- f ailed error. 



If a storage error occurred, PEA does 
the following: 

1. It checks the machine-check interrup- 
tion code for a valid, failing, 
storage address. If the address is 
invalid, PEA schedules the Soft 
Machine-Check Handler, which ter- 
minates the system. 

2. If PEA finds a valid failing storage 
address, it stores the beginning and 
ending addresses of the doubleword 
that includes the location in error in 
the REPDARF1 and REPDARF2 fields of 
the Independent Common Area. 

3. It uses the Store Multiple and Load 
instructions to test the failing loca- 
tion. The following bit patterns are 
stored and fetched: 

a. All binary 0"s 

b. All binary l"s 

c. Binary l's and f s (101010...) 

d. Binary 0"s and l"s (010101...) 

If there is a machine-check interruption 
that can be identified as a result of the 
storing or fetching, or if any bits are 
altered, the solid error switch in REPDAR2 
is turned on. Otherwise, the intermittent 
error switch in REPDAR2 is set since the 
original error cannot be duplicated. PEA 
restores the machine-check new PSW and 
returns control to the MCH Nucleus, which 
loads IGFMCHF1. 

If a Storage Protect Feature (SPF) key 
error occurred, PEA first acts as described 
above in steps 1 and 2 for storage errors. 
PEA then tests the SPF key in the following 
manner to determine whether the error is 
solid or intermittent: 

1. It uses the Set Storage Key (SSK) and 
Insert Storage Key (ISK) instructions 
to test all possible 4-bit combina- 
tions. If a machine-check interrup- 
tion occurs on the execution of either 
instruction on any single pattern 
(that is, if the original error can be 
duplicated) , the error is considered 
to be solid. The error is also consi- 
dered to be solid if the bit pattern 
changes after it is set in the loca- 
tion (by SSK) or after it is inserted 
back into the register (by ISK) . 
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2. For either type of solid error, the 
solid error indicator is set in 
REPDAR2. When all bit patterns are 
tested without a machine check, the 
error is considered to be intermit- 
tent. In this case, the intermittent 
indicator is set in REPDAR2, and con- 
trol is passed to IGFMCHF1. 

3. If a hardware-retry-f ailed error 
occurs, PEA sets a "termination neces- 
sary" bit in the MCH Common Area and 
passes control to IGFMCHF1. 



SYSTEM ANALYSIS 



Functions ; MVT System Analysis 1 deter- 
mines what error occurred and schedules the 
appropriate routine to handle it (one of 
the other system analysis modules or the 
PDAR Terminator) . MVT System Analysis 1 
also refreshes any intermittent storage 
errors 1 - within an SVC or error transient 
area and then passes control to the PDAR 
Terminator. 



Operation ; MVT System Analysis 1 receives 
control from the Preliminary Error Analysis 
module. It gives control to one of the 
other system analysis modules or to the 
PDAR Terminator as follows: 



There are two sets of system analysis 
modules: one set for MFT and one set for 
MVT. Only the MFT set is used with MCH for 
the Model 135 since only the MFT version of 
the operating system may be run on a Model 
135. Either the MFT or the MVT set may be 
used on the Model 145. 

Each set contains three modules. For an 
MFT operating system, these modules are: 

IGFMFTF1 
IGFMFTF2 
IGFMFTF3 

For an MVT operating system, the corres- 
ponding modules are: 

IGFMVTF1 
IGFMVTF2 
IGFMVTF3 

These system analysis modules are called 
Program Damage Assessment and Repair (PDAR) 
modules. They prepare the system for one 
of the following machine-check recovery 
procedures : 

• Setting nondispatchable those jobsteps 
associated with solid storage or SPF 
errors, to circumvent the solid error. 

• Abnormally terminating a jobstep using 
the ABEND routine. 



• When the machine check is a CPU error 

(not a storage or SPF error) and I/O 
interruptions are disabled for the 
interrupted task, IGFMVTF1 indicates, 
in REPDAR3 and REPDAR6, that a wait 
state message must be issued. When I/O 
interruptions are enabled for the task 
interrupted by a CPU error, IGFMVTF1 
indicates that the task must be ter- 
minated by ABEND. In both cases, 
IGFMVTF1 schedules the PDAR Terminator 
(IGFMCHF5) as the next routine. 

• When the machine check is an intermit- 
tent main storage error (indicated in 
REPDAR2) in an SVC or an error tran- 
sient area, IGFMVTF1 tries to restore 
the original data by loading a fresh 
copy of the affected module. Then, 
IGFMVTF1 indicates that the interrupted 
task must be terminated by ABEND and 
schedules the PDAR Terminator as the 
next routine. For all other intermit- 
tent storage errors, IGFMVTF1 schedules 
System Analysis 2 (IGFMVTF2) as the 
next routine. 

• When the machine check is a solid main 
storage error or an SPF error (either 
solid or intermittent as indicated in 
REPDAR2), IGFMVTF1 schedules System 
Analysis 3 (IGFMVTF3) as the next 
routine. 



Placing the system in a wait state when 
an error that affects a critical system 
task cannot be corrected. 



MVT SYSTEM ANALYSIS 2 (MODEL 145 ONLY) 



Note : In other parts of this manual, these Module ID : IGFMVTF2 

PDAR modules are referred to as IGFMCHF1, 

IGFMCHF2, and IGFMCHF3 when either MFT or 

MVT modules may be used. This is done to 

simplify documentation; MFT and MVT modules 

are mutually exclusive. 



Functions : MVT System Analysis 2 deter- 
mines which parts of the system are 
affected by an intermittent main storage 
error (multiple bit error) and takes appro- 
priate action. 



MVT SYSTEM ANALYSIS 1 (MODEL 145 ONLY) 
Module ID: IGFMVTF1 



^Although such errors are refreshed 
(repaired), the task affected by the error 
is still terminated. 
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Note ; Single-bit main storage errors are 
corrected by hardware and merely require 
recording. 

Operation : MVT System Analysis 2 deter- 
mines from flags in the REPDAR field wheth- 
er the intermittent main storage error 
occurred in a supervisor area (excluding 
system transient areas) or the Link Pack 
Area (excluding SVC and LINKLIB BLDL 
Tables) . If so: 

1. IGFMVTF2 indicates the system must be 
placed in a wait state and schedules 
the PDAR Terminator (IGFMCHF5) as the 
next routine when one of the following 
conditions also exists: 

a. The interrupted task is disabled 
for I/O interruptions. 

b. The failing storage location is 
marked cleared* 

c. The instruction counter (at the 
time of the interruption) 
addresses a location within the 
nucleus or Link Pack Area. 

2. IGFMVTF2 indicates the interrupted 
task must be terminated by the ABEND 
routine and schedules System Analysis 
(IGFMCHF6) as the next routine whenev- 
er none of the conditions listed under 
1 exist. 

When the intermittent main storage error 
occurred within the SVC BLDL Table, the 
LINKLIB BLDL Table f or both, IGFMVTF2 
deletes the affected table (s) and: 

1. Indicates the system is to be placed 
in a wait state and schedules the PDAR 
Terminator (IGFMCHF5) as the next rou- 
tine whenever the interrupted task is 
disabled for I/O interruptions. 

2. Indicates the interrupted task must be 
terminated by the ABEND routine and 
schedules Subsystem Analysis 
(IGFMCHF6) as the next routine whenev- 
er the interrupted task is enabled for 
I/O interruptions . * 

When the intermittent main storage error 
occurred within a system transient area, 
IGFMVTF2: 

1. Indicates the system is to be placed 
in a wait state and schedules the PDAR 



A When an intermittent main storage error 
overlaps an SVC or LINKLIB BLDL Table and 
another supervisor location , IGFMFTF2 
deletes the affected BLDL table but bases 
subsequent action on which other supervi- 
sor locations are affected. 



Terminator (IGFMCHF5) as the next rou- 
tine whenever the interrupted task is 
disabled for I/O interruptions. 

2. Indicates the interrupted task must be 
terminated by the ABEND routine and 
schedules Subsystem Analysis 
(IGFMCHF6) as the next routine whenev- 
er the interrupted task is enabled for 
I/O interruptions. 



MVT SYSTEM ANALYSIS 3 (MODEL 145 ONLY) 

Module ID : IGFMVTF3 

Functions : MVT System Analysis 3 deter- 
mines which parts of the system are 
affected by SPF errors (solid or intermit- 
tent) or by solid main storage errors, and 
it takes the appropriate action. 

Operation : For intermittent SPF key 
errors, MVT System Analysis 3 determines 
whether the error location associated with 
the failing key is within the Nucleus, Link 
Pack Area, or dynamic area subpool 252. If 
it is, IGFMVTF3 sets the failing key to 
zero using a Set Storage Key (SSK) instruc- 
tion. If the error is in a dynamic area 
subpool other than 252, IGFMVTF3 sets the 
failing key to the key value in the TCB of 
the interrupted task. In either case, 
IGFMVTF3 then indicates the interrupted 
task must be terminated by ABEND and sche- 
dules Subsystem Analysis (IGFMCHF6) as the 
next routine. 

For a solid SPF key error or a solid 
main storage error, IGFMVTF3 indicates the 
system must be placed in a wait state and 
schedules the PDAR Terminator (IGFMCHF5) as 
the next routine whenever: 

• The storage location associated with 
the error is the Nucleus or Link Pack 
Area. 

• The storage location associated with 
the error is in the dynamic area and 
the interrupted task is disabled for 
the I/O interruption. 

If the error location associated with the 
solid error is in the dynamic area and the 
interrupted task is enabled for I/O inter- 
ruptions, IGFMVTF3 sets the following 
nondispatchable : 

• The TCB of the interrupted task. 

• The associated jobstep TCB. 

• All subtask TCBs of that jobstep. 

IGFMVTF3 then indicates it took this action 
and schedules Subsystem Analysis (IGFMCHF6) 
as the next routine. 
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MFT SYSTEM ANALYSIS 1 

Module ID ; IGFMFTF1 

Functions ; MFT System Analysis 1 deter- 
mines what error occurred and schedules the 
appropriate routine to handle it (one of 
the other system analysis routines or the 
PDAR Terminator) . It also; 

1. Repairs intermittent SPF key errors, 

2. Sets nondispatchable the TCB of the 
interrupted task (and the TCBs of all 
associated tasks when the system uses 
subtasking) when the machine check is 
a solid SPF key error or a solid main 
storage error within the dynamic area. 

Operation ; MFT System Analysis 1 receives 
control from the Preliminary Error Analysis 
module. It gives control to one of the 
other system analysis modules or to the 
PDAR Terminator as follows: 

• When the machine check is a CPU error 
(not a storage or SPF error) and I/O 
interruptions are disabled for the 
interrupted task, IGFMFTF1 indicates, 
in REPDAR3 and REPDAR6, that a wait 
state message must be issued. When I/O 
interruptions are enabled for the task 
interrupted by a CPU error, this rou- 
tine indicates the task must be ter- 
minated by ABEND. In both cases, 
IGFMFTF1 schedules the PDAR Terminator 
(IGFMCHF5) as the next routine. 

• When the machine check is an intermit- 
tent main storage error (indicated in 
REPDAR2) , and the interrupted task is 
disabled for I/O interruptions, and 
termination is necessary (indicated in 
REPDAR1) , IGFMFTF1 indicates the system 
must be placed in a wait state and 
schedules the PDAR Terminator 
(IGFMCHF5) as the next routine. For 
all other intermittent main storage 
errors; 



system must be placed in a wait state 
and schedules the PDAR Terminator 
(1GFMCHF5) as the next routine. For 
all other intermittent SPF key errors; 

a. When the error is within the fixed 
area, IGFMFTF1 resets the failing 
key to 0. 

b. When the error is within the dynam- 
ic area, IGFMFTF1 resets the fail- 
ing key to the key value in the 
interrupted task's TCB. 

After resetting the failing key, 
IGFMFTF1 indicates that the interrupted 
task must be terminated by ABEND and 
schedules the PDAR Terminator 
(IGFMCHF5) as the next routine. 

• When the machine check is a solid 
error, in main storage or the SPF key, 
and the error is within a fixed area, 
IGFMFTF1 schedules IGFMFTF2 as the next 
routine. When the solid error is 
within a dynamic area, IGFMFTF1: 

a. Indicates the system is to be 
placed in a wait state and sche- 
dules the PDAR Terminator 
(IGFMCHF5) as the next routine, 
whenever the interrupted task is 
disabled for I/O interruptions. 

b. Whenever the interrupted task is 
enabled for I/O interruptions, it 
sets nondispatchable the TCB of the 
interrupted task and, if the system 
is subtasking, sets nondispatchable 
the TCBs of all tasks within the 
TCB chain headed by the jobstep TCB 
that includes the interrupted task. 
IGFMFTF1 then indicates it took 
this action and schedules the PDAR 
Terminator (IGFMCHF5) as the next 
routine. 



MFT SYSTEM ANALYSIS 2 



a. When the error location is within 
the fixed area of the operating 
system, IGFMFTF1 schedules MFT Sys- 
tem Analysis 2 (IGFMFTF2) as the 
next routine. 

b. When the error is within a dynamic 
area, IGFMFTF1 indicates the inter- 
rupted task must be terminated by 
ABEND and schedules the PDAR Ter- 
minator (IGFMCHF5) as the next 
routine. 

• When the machine check is an intermit- 
tent SPF key error, and the interrupted 
task is disabled for I/O interruptions, 
and termination is ncesssary (indicated 
in REPDAR1), IGFMFTF1 indicates the 



Module ID ; IGFMFTF2 

Functions ; MFT System Analysis 2 schedules 
the appropriate termination procedures for 
solid main storage and solid SPF key errors 
within the fixed area. It also refreshes 
intermittent main storage errors within 
transient areas, deletes BLDL Tables con- 
taining intermittent storage errors, or 
schedules IGFMFTF3 to handle intermittent 
storage errors in other locations. 

Operation ; MFT System Analysis 2 is 
entered by IGFMFTF1 for machine checks in 
the supervisor (fixed area) that are either 
intermittent main storage errors or solid 
errors in either main storage or the SPF 
key. 
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When the machine check is a solid error, 
IGFMFTF2 indicates the system is to be 
placed in a wait state and schedules the 
PDAR Terminator (IGFMCHF5) as the next 
routine. 



When the machine check is an intermit- 
tent error within the SVC or an error tran- 
sient area, IGFMFTF2 tries to restore 
(refresh) the damaged module by loading a 
new copy. When this refresh attempt is 
successful, IGFMFTF2 schedules the PDAR 
Terminator (IGFMCHF5) as the next routine 
after doing one of the following: 



• Indicating the system must be placed in 
a wait state when the interrupted task 
is disabled for I/O interruptions. 



• Indicating the interrupted task is to 
be terminated by ABEND when it is 
enabled for I/O interruptions. 



When the refresh attempt is unsuccessful: 

1. IGFMFTF2 indicates the system must be 
placed in a wait state and schedules 
the PDAR Terminator (IGFMCHF5) as the 
next routine when one of the following 
conditions also exists: 

a. The interrupted task is disabled 
for I/O interruptions. 

b. The failing storage location is 
marked cleared. 

c. The instruction counter (at the 
time of the interruption) 
addresses a location within the 
supervisor. 

2. IGFMFTF2 indicates the interrupted 
task must be terminated by ABEND and 
schedules the PDAR Terminator 
(IGFMCHF5) as the next routine whenev- 
er the conditions listed under 1 do 
not exist. 

When the machine check is an intermit- 
tent main storage error within the SVC BLDL 
Table, the LINKLIB BLDL Table, or both, 
IGFMFTF2 deletes the affected table (s) and: 

1. Indicates the system is to be placed 
in a wait state when the interrupted 
task is disabled for I/O 
interruptions . 

2. Indicates the interrupted task is to 
be terminated by ABEND when it is 
enabled for I/O interruptions. 

In either case, IGFMFTF2 schedules the PDAR 



Terminator (IGFMCHF5) as the next routine. *- 

When the machine check is an intermit- 
tent main storage error not mentioned above 
(that is, not within an SVC or error tran- 
sient area, the SVC BLDL Table, or the 
LINKLIB BLDL Table) , IGFMFTF2 schedules MFT 
System Analysis 3 (IGFMFTF3) as the next 
routine. 



MFT SYSTEM ANALYSIS 3 

Module ID : IGFMFTF3 

Functions : MFT System Analysis 3 indicates 
the system is to be placed in a wait state 
and passes control to the PDAR Terminator 
(IGFMCHF5). 

PDAR TERMINATOR 

Module ID : IGFMCHF5 

Functions : When system analysis routines 
request it, the PDAR Terminator does one of 
the following: 

• Prepares the job step associated with 
the interrupted task for abnormal ter- 
mination by ABEND. 

• Prepares a Resume PSW that will put the 
system into a wait state. 

In either case, and when system analysis 
has set the interrupted task nondispat en- 
able, the PDAR Terminatpr also schedules 
the appropriate message. 

Operation : The routines that call the PDAR 
Terminator (system analysis modules and, in 
MVT systems only, TS0 recovery routines) 
indicate one of the following actions: 

1. Terminate the interrupted task using 
ABEND. 

2. The interrupted task and associated 
tasks already have been set nondis- 
pat chable. 

3. Place the system in a wait state. 

4. An entire TSO subsystem already has 
been terminated by ABEND or set non- 
dispat chable (only possible on MVT 
systems when TSO is affected by the 
machine check) . 



*-When an intermittent main storage error 
overlaps an SVC or LINKLIB BLDL Table and 
another supervisor location , IGFMFTF2 
deletes the affected table but bases sub- 
sequent action on the other supervisor 
location affected. 
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5. A TSO user already has been terminated 
by ABEND or set nondispatchable (only 
possible on MVT systems when TSO is 
affected by the machine check) . 

When the request is to terminate using 
ABEND, IGFMCHF5 calls ABTERM, passing an 
ABEND code and the address of the inter- 
rupted task 1 s TCB. ABTERM sets up the 
interrupted task for ABEND and returns with 
the address of the ABEND SVC in the Resume 
PSW field of the task f s Request Block (RB) . 
IGFMCHF5 moves this address to the MCH 
Resume PSW so that it will also be set up 
for ABEND when MCH returns control to the 
operating system. 

When a task has been set nondispatch- 
able, IGFMCHF5 moves the Resume PSW from 
the interrupted task's RB to the MCH Resume 
PSW so that, upon exit from MCH, the system 
Resume PSW will not be changed. 

When the request is to place the system 
in a wait state, IGFMCHF5 moves a wait 
state Resume PSW into the MCH Resume PSW. 

When TSO is affected, either required 
termination has already been done or a sys- 
tem analysis module has indicated that one 
of the previously described actions must be 
performed. 

The PDAR Terminator always sets up a 
descriptive message in the message buffer 
MCHIBUF. This can consist of a message 
code only, a message code and associated 
text, or a message code, text, and a wait 
state code. 

The PDAR Terminator always schedules the 
Soft Machine-Check Handler as the next 
routine. 



TSO SUBSYSTEM ANALYSIS (MODEL 145 ONLY) 

Module ID ; IGFMCHF6 

Functions ; The PDAR Subsystem Analysis 
module is the interface between MCH and the 
Time Sharing Option (TSO) subsystem. This 
module receives control when an uncorrect- 
able main storage error or uncorrectable 
SPF Key error occurs in the dynamic area. 

Operation : Upon entry. Subsystem Analysis 
(IGFMCHF6) tests a flag in the Common Vec- 
tor Table (CVT) to determine whether TSO is 
active. If it is not, IGFMCHF6 exits to 
the PDAR Terminator for normal termination 
processing. 

If TSO is active, and the error is in 
the Local System Queue Area (LSQA), 
IGFMCHF6 schedules the system to be placed 
in a disabled wait state by setting a requ- 
est bit in REPDAR and passing control to 



IGFMCHF5 via the module loader. If the 
error is in the TSO Link Pack Area rather 
than the LSQA area, IGFMCHF6 stores the 
appropriate information in MCHSUB fields 
for the TSO Recovery Module. 

If neither of the above areas is 
affected, IGFMCHF6 determines whether the 
error affects TSO in any other area by com- 
paring the address of the error with TSO 
region addresses located in the TSO extent 
list queue. If the error does affect an 
area in TSO, IGFMCHF6 stores the appropri- 
ate information in the MCH Subsystem Common 
Area for the TSO Recovery routine 
(IGFMCH91). Control is then passed to that 
routine via the Module Scheduler. 

If TSO is not affected in any manner, 
IGFMCHF6 exits to the PDAR Terminator via 
the Module Scheduler. 



ERROR RECORDER 

Module ID : IGFMCHE2 

Functions : The Error Recorder writes eith- 
er a short or long record in the SYS1. 
LOGREC data set. 

Operation : The Error Recorder scans the 
record buffers and, when an active buffer 
is located, determines whether the record 
in it is a short or a long record. Before 
doing any I/O, the recorder enqueues on the 
SYS1. LOGREC recording queue to ensure sole 
access while recording. The Error Recorder 
then checks the header record of the SYS1. 
LOGREC data set to ensure that recording is 
possible and, if so, it writes the error 
record into the data set. After one record 
has been written, the Error Recorder scans 
remaining buffers, repeating the process 
until all active short record buffers have 
been written out. Finally, the recorder 
enables the system for soft machine-check 
interruptions if that is the normal system 
status . 

Note : For recording to be possible, a flag 
byte at the end of the SYS1. LOGREC header 
record must contain a X f FF f . If this flag 
byte contains any other value, the Disk 
Format Error Message is scheduled by turn- 
ing on a bit in MCHINT, and control passes 
to IGFMCHE1. 

The Error Recorder builds CCWs in the 
recorder portion of the MCH Common Area 
(MCHWORK) . It issues an EXCP macro 
instruction to perform the I/O. 

Note : The Error Recorder uses a common 
area to build CCWs as the transient area 
may be overlaid before the completion of 
the requested I/O. 
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When the SYS1.LOGREC data set is full, 
or when a disk format or I/O error occurs, 
this routine sets a flag in MCHINT to ind- 
icate these conditions to the Console Write 
routine (IGFMCHE1) . 

When all active records have been writ- 
ten, the Error Recorder passes control to 
the Console Write routine via XCTL. 



CONSOLE WRITE ROUTINE 



Module ID; IGFMCHEl 



Operation ; The Emergency Recorder performs 
the same tasks as the Error Recorder with 
the exception that it uses the Module Load- 
er in the MCH Nucleus to write the record 
into the SYS1.L0GREC rather than using the 
operating system I/O facilities (that is, 
rather than passing control to the system 
via XCTL). 

A channel error inboard record, con- 
structed by CCH, is recorded using the 
address of the record passed by CCH. The 
address of the record is originally passed 
in register 13 and is stored in the MCH 
Common Area by IGFMCHEO. 



Functions ; The MCH Console Write routine 
uses SVC 35 to write messages to the 
operator. 

Operation ; The Console Write routine scans 
MCHIBUF for a full SYS1.LOGREC I/O error 
message or disk format error message (if 
indicated in the MCHINT field). When eith- 
er of these priority messages is found, the 
Console Write routine issues an SVC 35 (WTO 
routine) to write it. Then, the Console 
Write routine issues SVC 35 to write any 
remaining messages (such as, normal mes- 
sages scheduled by other MCH routines) . 

This routine must handle two kinds of 
messages; preformatted and dynamic mes- 
sages. A preformatted message remains the 
same each time it is issued. For this type 
of message, a code is placed in MCHIBUF by 
the routine requesting the message. The 
code is matched to its corresponding mes- 
sage in the message table contained in the 
Console Write routine. The Console Write 
routine issues an SVC 35 specifying the 
address of the message in the table to be 
issued. 

A dynamic message is placed directly 
into the MCHIBUF by the routine requesting 
the message. Since a dynamic message con- 
tains unique data each time it is issued, 
it cannot reside in the Console Write mes- 
sage table. The address of the dynamic 
message in MCHIBUF is specified in register 
1 each time the Console Write routine 
issues an SVC 35 for a dynamic message. 



MACHINE STATUS CONTROL (MODEL 145 ONLY) 

Module ID ; IGF29701 

Functions ; The Machine Status Control rou- 
tine allows the operator to change the mode 
of recording soft machine- check interrup- 
tions for either main storage or control 
storage. 

Operation ; This routine analyzes the com- 
mand parameters entered by the operator and 
places main storage or control storage into 
the appropriate mode, using a Diagnose 
instruction. The format of the MODE com- 
mand is: 

MODE ( CNTR , RECORD/QUIET/THRES } 
(MAIN, RECORD/QUIET j 

Note ; The one combination which cannot be 
issued is MODE MAIN,THRES. 

See "Modes of Recovery Operation" in 
Section 1 for descriptions of MCH recovery 
operation when control and main storage are 
in the different modes. 

In all cases, the Machine Status Control 
routine writes message ICF061I to indicate 
that mode switching has been completed and 
returns control to the supervisor by issu- 
ing an SVC 3. 



WARNING; 



This MODE command is intended for 



use only by IBM personnel or at their requ- 
est. Use of this MODE command can degrade 
system performance. 



EMERGENCY RECORDER 

Module ID ; IGFMCHE3 

Functions ; The Emergency Recorder writes 
short MCH records, long MCH records, or a 
channel error record when either MCH or CCH 
determines that the system cannot continue. 
These records are written into the SYS1. 
LOGREC data set prior to system 
termination. 



MACHINE STATUS CONTROL (MODEL 135 ONLY) 

Module ID ; IGF13501 

Functions : The Machine Status Control rou- 
tine allows the operator to change the mode 
of recording soft machine- check interrup- 
tions for either storage or CPU errors from 
quiet mode to recording mode. It also 
allows the operator to display the current 
mode status. 
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Operation ; This routine analyzes the MODE 
command parameters entered by the operator 
and places main storage or control storage 
into recording mode using a Diagnose 
instruction (or it displays mode status) . 
The format of the MODE command is: 



MODE 



STATUS ) 
HIR RECORD) 
ECC RECORD) 



This routine is entered from the Mode Com- 
mand Router module, IGF2603D, when the MODE 
command is specified by the operator. 

See "Use of the Mode Commands'" in Sec- 
tion 1 for descriptions of MCH recovery 
operation involving the different modes. 



When mode switching is requested, this 
routine issues message IGF061I to indicate 
that the mode switch is complete. When 
mode status is requested, this routine 
issues message IGF053I, which has the fol- 
lowing format: 



MODE STATUS-ECC (RECORD) HIR 
(QUIET j 
COUNT-nn THRESHOLD-nn 



RECORD) 
QUIET I 



When a switch to ECC record mode is 
requested while CPU retry is still in quiet 
mode, it is considered an error, and mes- 
sage IEE305I is issued. 

In all cases, the Machine Status Control 
routine returns to the supervisor by issu- 
ing an SVC 3. 
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FLOWCHARTS 



The flowcharts are arranged in the order in which MCH modules are described in this 
section. The chart heading for each set of flowcharts corresponds to a module ID. 

Subroutine blocks contain the name of the subroutine on the first line of the block. 
If the subroutine is flowchart ed in this manual, its entry and page and block number are 
to the right and above the subroutine block. 

Some of the code represented in these charts should not be executed and invalid logic 
paths are indicated by notes. 

Note ; The following terms appear inside some decision blocks and are defined here: 

TERM NECESSARY — Either system or task termination has been requested by IGFMCH41 by 
setting REPDAR1, bit 0, to 1 (see Section 4 f "MCH Independent Common 
Area" ) . 

INHIBIT TERMINATION ON or REQUESTED — The "NO" branch must always be taken. 

RETRY ON — The "NO" branch must always be taken. 



Flowcharts 61 



IGFMCHFO. MCH Initialization (Part 1 of 2) 
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SOFT 


MACHINE-CHECK 


HANDLER TO 


MBBCCHHR FORMAT 



BUILD IOB, 
CCW'S TO LOAD 

SOFT 

MACHINE-CHECK 

HANDLER 



1 




NOTE 1 : THIS FLOW CHART REFLECTS 
ONLY THE LOGIC APPLICABLE 
TO THE MODELS 135 AND 145 
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IGFMCHFO. MCH Initialization (Part 2 of 2) 
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IGFMCHEO. MCH Nucleus (Part 1 of 9) 
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IGFMCHEO. MCH Nucleus (Part 2 of 9) 
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NOTE 1: BLOCK G1 TESTS 
VALIDITY BITS AT BITS 
20. 21 , AND 29 OF 
THE MACHINE-CHECK 
INTERRUPTION CODE. 
IF ALL THESE BITS ARE ON: 
THE AMWP, SYSTEM MASK, AND KEY 
(OF THE OLD PSW) ARE VALID 
AND THE CONTROL REGISTER SAVE 
AREA ACCURATELY REFLECTS THE 
CONDITION OF THE CONTROL REGS 
AT THE TIME OF THE INTERRUPTION. 



AND 31 OF THE MCIC. 

IF ALL ARE ON: THE 

PROGRAM MASK, CONDITION 

CODE, AND INSTRUCTION 

ADDRESS OF THE OLD 

PSW ARE VALID; AND 

THAT THE FLOATING POINT 

AND GENERAL REG SAVE 

AREAS ACCURATELY REFLECT 

REG CONTENTS AT THE 

INTERRUPTION: AND THAT THOSE 

STORAGE LOCATIONS THAT 

ARE MODIFIED BY THE 

INSTRUCTION PROCESSING STREAM 

CONTAIN CORRECT INFO RELATIVE 

TO THE POINT OF THE INTERRUPTION. 
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IGFMCHEO. MCH Nucleus (Part 3 of 9) 
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NOTE 1 : FOR SOME MODELS , A STORAGE 
PROTECT KEY ERROR MAY NOT TURN ON 
THE MCIC BIT FOR SPF ERROR. 



NOTE 2: EXT DAMAGE CAN OCCUR TO A CHANNEL, 
CHANNEL CONTROLLER, SWITCHING UNIT, 
OR OTHER UNIT EXTERNAL TO THE CPU, 
OR TO A STORAGE UNIT NOT DIRECTLY 
ASSOCIATED WITH THE CPU. 



NOTE 3: OR DAMAGE TO LOCATION X'50' 



NOTE 4: WHEN BIT 7 OF THE MACHINE- 
CHECK INTERRUPTION CODE IS 1 , THE 
OPERATION OF A PORTION OF THE MACHINE 
WHICH CANNOT BE NOTED BY THE PROGRAM 
HAS BEEN DISABLED BY AN EQUIPMENT 
MALFUNCTION. REDUCED PERFORMANCE 
WILL RESULT EVEN THOUGH LOGICAL 
OPERATION MAY CONTINUE. 
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INDICATE 

INVALID FUNC 
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Lr 
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SUCCESSOR 
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IGFMCHEO . 



MCH Nucleus (Part 4 of 9) 
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O. 



SET SYSTEM TERM 

NECESSARY, 
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D 



PROGCHEK IS ENTERED VIA THE PROGRAM 
CHECK NEW PSW WHEN A PROGRAM CHECK 
OCCURS DURING THE HANDLING OF A 
MACHINE-CHECK (SEE 01F3 FOR THE 
SETTING OF THE PGM NEW PSW) . 

THIS SPECIAL ENTRY SCREENS OUT 
AND IGNORES ANY PROGRAM CHECKS 
CAUSED BY MONITOR CALL INSTRUCTIONS 
(OPCODE X'AF' IS INVALID ON THE 
MODELS 135 AND 145) . 



U/ 



Flowcharts 67 



IGFMCHEO. MCH Nucleus (Part 5 of 9) 
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IGFMCHEO. MCH Nucleus (Part 6 of 9) 
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Flowcharts 69 



IGFMCHEO , 



MCH Nucleus (Part 7 of 9) 
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3 
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IGFMCHEO. MCH Nucleus (Part 8 of 9) 
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IGFMCHEO. MCH Nucleus (Part 9 of 9) 
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IGFMCH40 . 



Model 145 Soft Machine-Check Handler (Part 1 of 4) 



VIA MODULE SCHEDULER 
IN MCH NUCLEUS 
(ENTRY NMODSCED) 
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IGFMCH40- Model 145 Soft Machine-Check Handler (Part 2 of 4) 
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IGFMCH40. Model 145 Soft Machine-Check Handler (Part 3 of 4) 
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IGFMCH40. Model 145 Soft Machine-Check Handler (Part 4 of 4) 
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IGFMCH50. Model 135 Soft Machine-Check Handler (Page 1 of 4) 
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NOTE 1 : THIS BLOCK TESTS A BIT 
IN MCHDAMAG RATHER THAN A BIT IN 
REPDAR1 (SEE INTRODUCTION TO THESE 
FLOWCHARTS FOR THIS USE OF REPDAR1 ) . 
WHEN THE MCHDAMAG BIT INDICATING 
TERMINATION IS ON, THE SYSTEM (AND 
NOT JUST A PROBLEM PROGAM) MUST BE 
TERMINATED . 



EFL IS AN ERROR FREQUENCY LIMIT 
BUILT INTO THE HARDWARE THAT ISSUES 
A MACHINE CHECK WHEN SOLID, 
SINGLE-BIT ERRORS REACH THE 
RATE OF 256 SINGLE-BIT ERRORS 
IN 416 MICRO-SECONDS. 
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IGFMCH50. Model 135 Soft Machine-Check Handler (Part 2 of 4) 
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IGFMCH50. Model 135 Soft Machine-Check Handler (Part 3 of 4) 
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IGFMCH50. Model 135 Soft Machine-Check Handler (Part 4 of 4) 
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IGFMCH41. Preliminary Error Analysis 
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PATTERN 



0- 



EXERCISE 

FAILING 

LOCATION WITH 

BIT PATTERN 



-H2- 





GET NEXT 

STORAGE KEY BIT 

PATTERN AND 

EXERCISE THE 

FAILING KEY 






L 







SET SPF KEY TO 

ZERO AND 

INDICATE 

INTERMITTENT 

SPF ERROR 



L 







L 
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IGFMVTF1. MVT System Analysis 1 (Part 1 of 2) 




INDICATE TYPE 

OF TASK (SEE 

NOTE 1 ) 




SCHEDULE SYSTEM 

ANALYSIS 3 

(IGFMVTF3) AS 

SUCCESSOR 



U 









SCHEDULE 

SUPERVISOR 

ERROR MESSAGE 



SCHEDULE PDAR 

TERMINATION 

(IGFMCHF5) AS 

SUCCESSOR 



J1AUUU 



SUCCESSOR 

TRANSIENT 

MODULE 



VIA MODULE 
SCHEDULER IN 
MCH NUCLEUS 
( ENTRY 
NMODSCED) 



) 



NOTE 1 : WAIT PSEUDO TASK, 
MASTER SCHEDULER, 
PROBLEM PROGRAM TASK, 
OR SYSTEM TASK 



NOTE 2: INVALID PATH FOR 
MODEL 145. SHOULD 
NOT BE EXECUTED. 





u © 
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IGFMVTF1. MVT System Analysis 1 (Part 2 of 2) 





REQUEST BLDL 

DELETE ABEND 

MESSAGE 




INDICATE ERROR 

IN SVC TRANS 

AREA 



SCHEDULE PDAR 

TERMINATOR 

(IGFMCHF5) AS 

SUCCESSOR 



L B 




MLDRIOGO 
07A2 OF CHART 
uIGFMCHEO 



SCHEDULE SYSTEM 

ANALYSIS 2 

(IGFMVTF2) AS 

SUCCESSOR 



L 



& 



SVC TRANSIENT AREA IS 
REFRESHED WHEN POSSIBLE. BUT 
THE AFFECTED TASK IS STILL 
TERMINATED BY MCH . 




PUT A 1 IN 
MCHNXMOD TO 

INDICATE 
INVALID 

SUCCESSOR 
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IGFMVTF2. MVT System Analysis 2 (Part 1 of 2) 



MVT SYSTEM A 
ANALYSIS 2 \ 



ENTERED BY IGFMVTF1 





CHECK EACH BLDL 

TABLE AND 

REPAIR ERROR IF 

ERROR FOUND 



MODULE LOADER 





REPAIR THE 

FAILING AREA 

USING CHECKSUM 

TECHNIQUES 




PUT 1 IN 

MCHNXMOD TO 

SCHEDULE 

INVALID 

SUCCESSOR 
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IGFMVTF2. MVT System Analysis 2 (Part 2 of 2) 








i 



SCHEDULE PDAR 

TERMINATION 

(IGFMCHF5) AS 

SUCCESSOR 



U 









PUT 3 IN 

MCHNXMOD TO 

SCHEDULE 

INVALID 

SUCCESSOR 



-F1- 




SCHEDULE 
SUBSYSTEM 
ANALYSIS 
(IGFMCHF6) AS 
SUCCESSOR 



REQUEST 

APPROPRIATE 

BLDL MESSAGE 



~1 



r 



XUUU 

/" SU( 

( TR, 
\ M( 







REQUEST BOTH 

BLDL MESSAGES 

(IGF023I AND 

IGF063I) 



~l 







SUCCESSOR 

TRANSIENT 

MODULE 



J 



©VIA MODULE SCHEDULER 
IN MCH NUCLEUS 
(ENTRY NMODSCED) 
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IGFMVTF3. MVT System Analysis 3 




VIA MODULE SCHEDULER 
IN MCH NUCLEUS 
(ENTRY NMODSCED) 
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IGFMFTF1. MFT System Analysis 1 (Part 1 of 2) 




VIA MODULE SCHEDULER 
IN MCH NUCLEUS 
(ENTRY NMODSCED) 
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IGFMFTF1. MFT System Analysis 1 (Part 2 of 2) 



DETERMINE 2K 

BLOCK WITH 

ERROR 




/ BLOCK IN ^vY: 



MCHNXMOD TO 

SCHEDULE 

INVALID 

SUCCESSOR 



I— ►foT 



GET ADDR OF ERR 

AND CORRECT BAD 

DATA 




-E4- 



RESET NUCLEUS 

SPF AND 

INDICATE 

SUCCESSFUL 

REPAIR 




SCHEDULE PDAR 

TERMINATOR 

(IGFMCHF5) AS 

SUCCESSOR 
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IGFMFTF2. MFT System Analysis 2 




IF THE ERROR CROSSES TABLE 
BOUNDARY INTO NON-BLDL 
AREA, TABLE IS DELETED 
AND SEARCH FOR OTHER AREA 
AFFECTED (AT TRANS 1 ) TAKES 
PRIORITY. 
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IGFMFTF3. MFT System Analysis 3 




90 



IGFMCHF5. PDAR Terminator (Part 1 of 2) 



C— A1 >. 
PDAR A 
TERMINATION ) 
ANALYSIS J 




SET UP MESSAGE 

IN BUFFER AS 

REQUESTED 




SET RESUME PSW 








SCHEDULE SOFT 

MACHINE-CHECK 

HANDLER AS 

SUCCESSOR 



D 



VIA MODULE SCHEDULER 
IN MCH NUCLEUS 
(ENTRY NMODSCED) 



FAILING ADDR 
CONVERTED FOR 
PRINT IN MSG 



DETERMINE 

STORAGE ERROR 

TYPE 



-H4- 



REQUEST MESSAGE 

IGF001 AND 

UPDATE BUFFER 

POINTER TO NEXT 

SLOT 



UPDATE BUFFER 

POINTER TO NEXT 

SLOT 



WAIT STATE 

RESUME PSW TO 

RESUME PSW 



KABND | 



INCREASE 

POINTER TO NEXT 

SLOT 



SET SYS. TERM 

INDIC. IN 

MCHDAMAG AND 

SHUT INDIC. " 

MCHINTEL 



IN 



U 



© 
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IGFMCHF5 . 



PDAR Terminator (Part 2 of 2) 




REQUEST IGF023 
LINKLIB BLDL 

DLT MESSAGE 




SCHEDULE 

IGF063I SVCLIB 

BLDL DLT 

MESSAGE 



INCREMENT 

POINTER TO NEXT 

SLOT 




1' 


LOAD ADDRESS OF 
ABTERM 



SCHEDULE 

NONDISPATCHABLE 

MESSAGE 



I— ►FoT 



R> 



ABTERM OS 



ABTERM 

PROCESSING 

(NOTE 2) 



RESTORE GPR AND 
SET ABEND 
INDICATOR 




135 AND 145. 



NOTE 2: THE ADDRESS OF ABTERM IS FOUND 
IN THE CVT. ABTERM IS A TYPE 1 
SVC THAT POSTS THE TCB WITH THE 
COMPLETION CODE, SETS UP THE INTERRUPTED 
TASK FOR ABEND. AND POINTS THE 
RESUME PSW OF THE INTERRUPTED TASK'S 
RB TO THE ABEND SVC. 



SET RESUME PSW 
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IGFMCHF6. TSO Subsystem Analysis 



C-Al v 
SUBSYSTEM A 
ANALYSIS J 



RECEIVES CONTROL FROM 
IGFMCHF2 OR IGFMCHF3 WHEN 
DYNAMIC STORAGE FAILURE OCCURS 








.2. 



SET WAIT 

MESSAGE CODE IN 

REPDAR6 



SCHEDULE PDAR 

TERMINATOR 

(IGFMCHF5) AS 

SUCCESSOR 



►fPDAR TERMINATOR) 

VIA MODULE SCHEDULER 
IN MCH NUCLEUS 
(ENTRY NMODSCED) 



INDICATE TSO 

ACTIVE. STORE 

BEG. S END ADDR 

OF ERR, STORE 

MC OLD PSW 



<e» 



SET SOLID 

STORAGE ERROR 

INDICATOR IN 

MCHIN1 



-HI- 



INTERMITTENT 

STORAGE ERROR 

INDICATOR IN 

MCHIN1 



SCHEDULE TSO 
SUBSYSTEM 
MODULE 

( IGFMCH9 1 ) AS 
SUCCESSOR 



S K 

/TSO : 
fANALYI 



jleJ 



VIA MODULE LOADER 
IN MCH NUCLEUS 
(ENTRY NMODSCED) 
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IGFMCHE2. Error Recorder (Part 1 of 4) 



(error recorder \ 



ENTERED VIA XCTL 
FROM COMMUNICATIONS 
TASK ROUTER 





SET UP 

PARAMETER REG. 

WITH ADDR OF 

MCHINT 



ENABLE SOFT 

MACHINE -CHECK 

INTERRUPTS IF 

THAT WAS STATUS 

BEFORE MCI 



S B5 "V 

/CONSOLE WRITER 
►( ROUTINE J 



SET SHORT 

RECORD FLAG AND 

TURN OFF 

CLOBBER BIT 



r 



B 





BUILD CCW'S IN 
MCH WORK AREA 
(SEE NOTE 1) 



ENQUEUE SELF 

(IGFMCHE2) ON 

SYS1 .LOGREC 

(NOTE 2) 



CCW'S ARE BUILT IN MCHWORK 
OF THE COMMON AREA SINCE 
IGFMCHE2 IS A TRANSIENT 
ROUTINE. CCW'S ARE TO READ 
HEADER RECORD FROM SYS 1. LOGREC. 



EXCPCALL 04A2 
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IGFMCHE2. Error Recorder (Part 2 of 4) 




TURN ON FLAG TO 

INDICATE HEADER 

RECORD FORMAT 

ERROR 



L, 



E> 



GET LENGTH OF 

SHORT REC 
(ABREC) , MOVE 
ABREC TO BUFF 



MOVE DAMAGE 

ASSESSMENT 

FIELD INTO 

RECORD, MOVE 

ABREC TO BUFFER 



-E1- 

DETERMINE SPACE 
NEEDED FOR 

RECORD BY 
DEVICE TYPE 

(NOTE 2) 




STORE REMAINING 

BYTE COUNT IN 

HEADER , UPDATE 

LAST ID WRITTEN 

IN HEADER 



TURN ON FLAG TO 

INDICATE LOGREC 

FULL 



SET UP CCW'S TO 

WRITE RECORD ON 

SYS1 .LOGREC 



UfoT\ 




TURN ON FLAG TO 

INDICATE NO 

ROOM FOR LONG 

REC 



I— ►fen 



& 



NOTE 2: 




DEVICE TYPE 


SPACE NEEDED 


2311 


61+(537(DL)/512) 


2301 


133+DL 


2303 


108+DL 


2305A 


432+2+DL 


2305C 


91+DL 


2314 


101+(2137(DL)/2048) 


3330 


135+DL 
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IGFMCHE2. Error Recorder (Part 3 of 4) 




NO 


EXCPCALL 04A2 


WRITE RECORD ON 
SYS 1 . LOGREC 







NO 


EXCPCALL 04A2 


WRITE EOF ON 
LAST TRACK 





EXCPCALL 04A2 



BUILD AN 

UPDATED HEADER 

RECORD 



EXCPCALL 04A2 



L B 
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IGFMCHE2. Error Recorder (Part 4 of 4) 



f EXCPCALL J 



ENTERED FROM 01J2- 03B3, 
03E4, 03F3. AND 03H3 TO 
READ AND WRITE FROM 
SYS1 .LOGREC. 



> CLEAR EXCP ECB 



WAIT MACRO 




C-F3 N 
RETURN TO \ 
CALLER J 



VIA BRANCH ON REG 2 
TO 01K2. 03B3, 03E4, 
03F3, OR 03H3 



SET FLAG FOR 

I/O FAILURE 

MESSAGE 



I— ►[oT 



E> 
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IGFMCHE1. 



Console Write Routine 




GET ADDRESS OF 

BEGINNING OF 
MESSAGE BUFFER 



©- 



MOVE CODE FOR 

WTO INTO COMMON 

AREA 








-E2- 



TURN OFF LOGREC 

FULL FLAG, GET 

ADDR LOGREC 

FULL MSG 

(IGF054E) 



TURN OFF DISK 

FORMAT ERROR 

FLAG, GET ADDR 

OF MSG 







1 



REINITIALIZE 

POINTER TO 

MESSAGE BUFFER 



INITIALIZE PTR 

TO MSG TABLE 
AND MSG BUFFER 




SET UP MOVE 

INSTRUCTION FOR 

FIXED MESSAGE 




L 








f WAIT ROUTINE J 

TO IEECMQWR (MVT) OR 
IEECMAWR (MFT) VIA BR 14 



UPDATE TABLE 
POINTER TO NEXT 
ENTRY IN SEARCH 



SET UP MOVE 

INSTRUCTION FOR 

DYNAMIC MSG 



GET ADDRESS OF 

BEGINNING OF 
MESSAGE BUFFER 



L 







r© 
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IGFMCHE3. Emergency Recorder (Part 1 of 3) 




CCW'S ARE BUILT IN 
MCHWORK OF THE COMMON 
AREA SINCE IGFMCHE3 IS 
A TRANSIENT ROUTINE. 
CCW'S ARE TO READ HEADER 
FROM SYS 1 . LOGREC . 
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IGFMCHE3. Emergency Recorder (Part 2 of 3) 




GET LENGTH OF 

SHORT RECORD 

AND MOVE ABREC 

TO BUFFER 



GET LENGTH OF 

LONG RECORD AND 

SET IT UP IN 

BUFF 



SET UP PARTS OF 

CCW'S COMMON TO 

LONG AND SHORT 

RECS 




FILL IN PARTS 
OF CCW'S COMMON 
TO LONG, SHORT, 

AND CCH RECS 



-E1- 



DETERMINE SPACE 

NEEDED 

ACCORDING TO 

DEVICE TYPE 

(NOTE 1) 





DEVICE TYPE | SPACE NEEDED 



GET ADDRESS OF 

CCH RECORD, SET 

UP COUNT BYTE 



TURN ON FLAG TO 

INDICATE NO 

ROOM FOR LONG 

REC 



L 



E> 



L B 



2311 

2301 

2303 

2305A 

2305C 

2314 

3330 



61+(537(DL)/512) 

133+DL 

108+DL 

432+2+DL 

91+DL 

101+(2137(DL)/2048) 

135+DL 
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IGFMCHE3. Emergency Recorder (Part 3 of 3) 



ALL SET 

IORTN " 03A4 




SWI TCH4 

IORTN " 




( IORTN \ 



FROM BLOCKS 01H1, 03A1, 
03B2, 03C1, 03C4. TO READ 
OR WRITE FROM SYS1.LOGREC 



MOD LOADER 07A2 




AT BLOCK 01H1 , 03A1 , 
03B2, 03C1, OR 03C4 
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IGF29701. Model 145 Machine Status Control 



/MAC 



D 



FROM IGF2603D 
(MODE COMMAND 
ROUTER MODULE 
FOR MODEL 145) 



SET ERR CODE 10 

FOR MSG 
ASSEMBLY MODULE 






C— D4 v 
ERROR EXIT \ 
MODULE I 

TO IGC0503D VIA XCTL 



-E4- 

SET MAIN 

STORAGE INTO 

QUIET MODE VIA 

DIAGNOSE 

INSTRUCTION 



© a 



SET MAIN STOR 
INTO RECORD 
MODE VIA 
DIAGNOSE 
INSTRUCTION 



GETMAIN BUFFER 



MOVE MESSAGE 

INDICATING MODE 

SWITCH TO 

OUTPUT BUFFER 




SET ERR CODE 8 

FOR MSG MODULE 

ASSEMBLY 



SET CONTROL 

STORE IN RECORD 

VIA DIAGNOSE 

INSTRUCTION 



SET CONTROL 

STORE IN QUIET 

VIA DIAGNOSE 

INSTRUCTION 



SET CONTROL 

STORE IN 

THRESHOLD MODE 

VIA DIAGNOSE 

INSTRUCTION 



FREEMAIN BUFFER 



L 







L 



© 



L 



© 



L 







(exit ' 



VIA SVC 3 



D 
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IGF13501. Model 135 Machine Status Control 



iMAC 



D 



FROM IGF2603D 
(MODE COMMAND 
ROUTER MODULE 
FOR MODEL 135) 



SET ERROR CODE 
'E' FOR MESSAGE 
ASSEMBLY MODULE 




MOVE 'MODE' 

COMMAND 

EXTENDED AREA 



NAME FOR XCTL 



r ERROR EXIT *\ 
► ( MODULE \ 

TO IGC0503D VIA XCTL 



X' STATUS' V^i 
^ PARAMETER V 

^SXYES 





SET CR 14 BIT 
TO ENABLE 
RECOVERY 
REPORTS 



SET STORAGE 
INTO RECORD 
MODE VIA 
DIAGNOSE 







SET HIR RECORD 

MODE INDICATOR 

IN MSB 



.1 



MOVE CONVERTED 

COUNT INTO 

MESSAGE 



CONVERT SOFT 

ERROR COUNTS 

FOR MESSAGE 



SET ECC RECORD 

MODE INDICATOR 

IN MSB 



RESTART HIR 

ERROR COUNT IN 

MSB 



WTO - ISSUE 
CONSOLE 
MESSAGE 



MOVE BASIC 

MESSAGE TO WORK 

AREA 



- WTO - ISSUE 
CONSOLE 
MESSAGE 
IGF061I 



- FREEMAIN - 

RELEASE WORK 

AREA 



MOVE CURRENT 

MODES INTO 

MESSAGE 



(exit , 



EXIT VIA SVC 3 



) 



MOVE DEFAULT 

THRESHOLD IN 

MESSAGE 



L 
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SECTION 4: MCH DATA AREAS 



This section contains descriptions of 
the data areas: 

Model Dependent Common Area 

MCH Independent Common Area 

Record Buffer Build Area 

Fixed Logout 

Extended Logout 

Damage Assessment Field Buffer Area 

Subsystem Data Area 

Machine Status Block 

Figure 3 shows the location of these 
data areas relative to the MCH program and 
one another. 

Note : The Record Buffer Build Area f Fixed 
Logout, Extended Logout, and Damage Assess- 
ment Field Buffer Area together constitute 
the basic MCH record written for each 
machine-check interruption. The techniques 
of buffering and writing these areas are 
discussed in this section under Record 
Buffer Build Area. 

MODEL DEPENDENT COMMON AREA 

The Model Dependent Common Area occupies 
8 bytes in the MCH Resident Area: 

Bytes 0-5 are reserved 

Bytes 6-7 contain the length of the 

damage assessment field (see 
MCHDAMAG in Figure 13). 

MCH INDEPENDENT COMMON AREA 

The MCH Independent Common Area occupies 
1024 bytes in the MCH Resident Area. It is 
used by the MCH modules to communicate with 
one another and to store data to be entered 
into the environmental record. Each field 
in the Common Area is described below, in 
alphabetic order, with the fields displace- 
ment in decimal and hexadecimal, from the 
beginning of the Common Area. A storage 
map of the Common Area follows the descrip- 
tion (see Figure 23). 

MCHABREC 737 (2E1) 

Three byte pointer to ABREC records, 
TTR table, and successor list. 



MCHABRNO 736 (2E0) 

One byte containing the number (in 
hexadecimal) of abbreviated records. 

MCHASRNO 354 (162) 

One byte containing the number (in 
hexadecimal) of checksum records. 

MCHASRTR 352 (160) 

One word containing the record number 
(in hexadecimal) of the first checksum 
record on SYS1.ASRLIB. 

MCHBLDL 344 (158) 

A pointer to the LINKLIB BLDL table. 

MCHBUILD 720 (2D0) 

A one-word pointer to the MCH error 
record build area. 

MCHCVT 308 (134) 

A pointer to the CVT. 

MCHDAMAG 8 (8) 

Two-word field containing error infor- 
mation to be used as part of the 
environmental record. (Figure 12 
illustrates the contents of this 
field. ) 

MCHDCB 400 (190) 

A pseudo DCB containing one word for 
use in loading records. 

MCHDEB 404 (194) 

A pseudo DEB containing twelve words 
the first 6 of which are all zeros. 
Fields included in the pseudo DEB are 
MCHDEDCB, MCHDEXSC, MCHDEBXT, 
MCHSTEXT, MCHENEXT, and MCHNMTRK. 
Figure 24 describes the fields of 
MCHDEB. 

MCHDEBXT 428 (1AC) 

One word containing the file mask and 
the address of the UCB. 

MCHDEDCB 420 (1A4) 

One word containing the protect key, 
the DEB ID, and a pointer to the asso- 
ciated DCB. This field is used by IOS 
to read checksums from SYS1.ASRLIB. 

MCHDEXSC 424 (1A8) 

One word containing an exit scale for 
direct access devices and an address 
appendage table. 

MCHDISPL 356 (164) 

One byte containing the displacement 
into successor table of the successor 
ID. 



104 



Dec Hex r 




I 
X_- . 

MCHDAMAG 



8 



8 



16 10 

24 18 

32 20 

40 28 

48 30 

56 38 

64 40 

72 48 

80 50 

88 58 

96 60 



h 



MCHDPADR 



4 (See note) 



MCHLOGIC 






MCHLSUM 8 



MCHHISTY 28 



REPDARF1 



r r r 

| REPDAR1 | REPDAR2 | REPDAR3 | REPDAR4 

REPDAR5 | REPDAR6 | REPDAR7 | REPDAR8 | 
X X X .j. 

REPDARF2 4 | REPDARI 4 

X 

8 



-x \ 

H 






H 



REMCOPSW 



|. 



REPDAR 16 



MCHTTRS 






MCHPSA 128 



224 E0 
232 E8 






J. 






MCHMLSAV 



64 



288 120 
296 128 
304 130 
312 138 
320 140 
328 148 
336 150 
344 158 
352 160 
360 168 



MCHNXIDS 



MCHTCB 



MCHTTRIN 



,. 



MCHSHUT 
MCHSIRB 



4 
4 



MCHCVT 4 
MCHERXNT 4 



MCHPSTAD 



MCHLDADR 



H 



MCHEXCP 



MCH2NDRY 



h 



MCHIOENT 



I 



MCHERIOB 



MCHBLDL 4 | 

MCHASRTR | MCHRELNO |MCHASRNO |MCHNXMOD |MCHDISPL |MCH1STDS | 
X — X X X X L 



MCHMSB 4 



MCHINTEL 



MCHSUBA 



MCHSUBF 



H— 



MCHSUBP 



368 170 | MCHNEST 4 j 

L X 

Figure 23. MCH independent common area (Part 1 of 2) 



MCHSVCBL 4 
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Dec Hex r — 

I 
376 178 | 

I 
396 18C| 



MCHSPARE 20 



- T — 

I 
-A — 



MCHUCB 



MCHDCB 



404 194| 

1 
412 19C| 

h 

420 1A4| 

h 






MCHDEB 



16 



I 
-+- 



MCHDEDCB 
MCHDEBXT 



428 1AC| 

436 1B4|MCHSTEXT 2 (Cont) | 
|. J.. 

444 1BCJ MCHIOB 



4 
4 



JL 

MCHENEXT 



MCHDEXSC 
T — 

(SPARE) 2 | 
j_ 

4 I 



MCHSTEXT 2 
MCHNMTRK 2 



h 



r — 

I 
x 

MCHIOCSW 



MCHIOECB 



1 



452 1C4| MCHIOCSW 8 

460 ICC | MCHICCWS 4 | MCHIODCB 4 



468 ID 4 J MCHIOBSP 8 

476 IDC I MCHIOBSK 8 



484 1E4| 
492 1EC| 

I 

I 

I 
588 24C| 

596 254| 



SPARE 



MCHINT 



-H 



MCHIPTR 



+ - 



MCHIBUF 



100 



MCHWORK 



120 



716 2CCI 






MCHLONG 



h 



"T" 

I 



MCHBUILD 






1 



724 2D4I 



732 2DCI 



V — 



MCHINLOG 4 
MCHENQ 4 



| JMCHABRENO 1 

i 



MCHREMCH 4 



MCHABREC 



Figure 23. MCH independent common area (Part 2 of 2) 
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I r I 

I Byte | Field Name 



J 0-23 JMCHDEB 



-+■ 



T T" 

Length | 
(bytes) J 



j. + 

| 24-27 JMCHDEDCB 
I I 



~h 



|28-31|MCHDEXSC 



L + 

| 32-35 |MCHDEBXT 

L + f 

J 36-37 | 

L + f 

| 38-41 |MCHSTEXT 

I I 

| 42-45 |MCHENEXT 
I I 

|46-47|MCHNMTRK | 
L JL JL 

Figure 24. Fields 



Contents 



24 |Pseudo field 

| containing zeros 



H 



H 



4 | Protect key, 

| pointer to DCB, 
| DEB ID 
+ A 

4 | Exit scale and 
| appendage table 
+ \ 

4 | File mask 

2 | Reserved 

4 | Starting CCHH of 
| extent 

4 | Ending CCHH of 
| extent 

2 | Number of tracks | 

j. j 

of MCHDEB 



MCHDPADR (0) 

One word containing the address of the 
dependent Common Area. 



MCHENEXT 438 (1B6) 

The ending CCHH of the extent contain- 
ing the specified transient module. 



MCHENQ 732 (2DC) 
Reserved. 

MCHERIOB 340 (154) 

The address of the IOB for the MVT 
error transient area. 

MCHERXNT 316 (13C) 

A pointer to the MFT error transient 
area. 

MCHEXCP 328 (148) 

Entry into MCH/IOS interface. 

MCHHISTY 24 (18) 

Seven words containing the type and 
order of modules called during error 
processing. (See Section 5: "Diag- 
nostic Aids. ") 

MCHIBUF 496 (1F0) 

A 25-word message buffer. 

MCHICBSP 440 (1B8) 
Reserved 

MCHICCWS 460 (ICC) 

Address of the channel program used to 
service all MCH I/O requests. 



MCHINLOG 724 (2D4) 

A one-word pointer to the model inde- 
pendent logout save area. 

MCHINT 488 (1E8) 

One word used to inform the Emergency 
Recorder whether a CCH record must be 
recorded. This field also tells the 
Console Write routine the condition of 
S YS1 . LOGREC . It is used to interface 
between MCH and recording and console 
write routines. 

MCHINTEL 358 (166) 

A two-byte field containing indicators 
used by the Secondary Error Handler. 
Figure 25 describes the significant 
bits in MCHINTEL. 

MCHIOB 444 (1BC) 

A flag word used by the Module Loader. 
It also denotes the entire group of 
fields in Figure 26. 

MCHIOBSK 476 (IDC) 

Two words containing the SEEK address 
(MBBCCHHR) of the module to be loaded. 
The Emergency Recorder uses this field 
to read and write header records and 
write error records. 

MCHIOBSP 468 (1D4) 

IOB displacement space. 

MCHIOCSW 452 (1C4) 

Two words containing the last seven 
bytes of the CSW. 

MCHIODCB 464 (1D0) 

The address of the DCB of SYS1.SVCLIB 
for module loading or the address of 
the DCB of SYS1. LOGREC for emergency 
recording. 



Byte j Bit j 



1 



Meaning 



System down if unexpected 
error changes system data 
Continue MCH to 
termination- schedule emer- 
gency recorder (put system 
down) 

Put system down with sche- 
duled message 

3 | Reserved 

4 j Reserved 

5 I Terminator scheduled from 
Secondary Handler 

6 | Status change has occurred 

7 j Error record recorded 

Indicates Module Loader 
used 



1-7 



| Reserved 

JL 



Figure 25. Fields of MCHINTEL 
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I Byte 

I 

|0 

I 

J- 

1 1-3 
|. 

| 4-7 



+ + 



I 



19-15 



H 



|16-19 

I 



h 

|20-23 



h 
|24-31 

h + 

|32-39 

I 



Field Name 



MCHIOB 



MCHIOCSW j- f 



MCHIODCB 



MCHIOBSK 



Length 
(bytes) 



3 

4 



1 
7 



MCHICCWS | 4 | Address of 

channel program 



Content 

Flag for command 
chaining 



Not used 



Pointer to 

MCHECB 



I/O Error Flags 
(IOBFLAG3) 

Last 7 bytes of 
CSW 



Address of DCB 



Reserved 



SEEK field 
(MBBCCHHR) 



Figure 26. Fields of MCHIOB 



MCHIOECB 448 (ICO) 

A pointer to the MCH ECB. 

MCHIOENT 336 (150) 

Entry point address of IOS. 

MCHIPTR 492 (1EC) 

A one-word pointer to the WTO message 
buffer. 

MCHLDADR 324 (144) 

Entry point of MCH Module Loader. 

MCHLOGIC 4 (4) 

One word containing control bits used 
by the MCH Nucleus to determine which 
model- dependent MCH module should be 
designated as successor. 

MCHLONG 716 (2CC) 

A one-word pointer to the long error 
record. (See "Error Recording" in 
Section 2.) 

MCHLSUM 16 (10) 

Two words denoting the type and number 
(in hexadecimal) of records lost due 
to being overlaid by new error rec- 
ords. The field is formatted as shown 
in Figure 27. 

MCHMLSAV 228 (E4) 

Sixteen words used by I/O and tran- 
sient routines to save registers. 

MCHMSB 348 (15C) 

A pointer to the Machine Status Block. 



f Byte | 



Meaning 



,. — + __ ., 

Number of tasks terminated due to 
CPU errors. 

Number of tasks recovered from CPU 
errors. 



f 

1 



^ + _ 



Number of errors recovered by CPU 
retry. 



H 



Number of tasks terminated due to 
storage errors. 



Number of tasks recovered from 
storage errors. 



Number of errors recovered by ECC. 



Reserved. 



j. + 

7 | Reserved. 

L X j 

Figure 27. Fields of MCHLSUM 



MCHNEST 368 (170) 

A pointer to the IOS nest switch. 

MCHNMTRK 442 (1BA) 

Number of tracks in the extent of the 
data set being referenced. 

MCHNXIDS 292 (124) 

A pointer to an eight word field con- 
taining the name of each MCH module 
and a list of the IDs of its successor 
modules. The format of this field is: 



r t t t 1 

I I I ID Of I ID Of | 

j Name | Number of | Successor | Successor j 
j (ID) j Successors j 1 | n | 

l a j J J 

where each ID occupies one byte 



MCHNXMOD 355 (163) 

One byte denoting functional successor 
(set up by transient modules for use 
by Module Scheduler) . 

MCHPSA 100 (64) 

Thirty-two words to save the Permanent 
Storage Allocation (0-128 decimal) at 
the time of the interruption. 

MCHPSTAD 320 (140) 

The address of the MCH posting 
routine. 

MCHRELNO 353 (161) 

Operating system release number (in 
hexadecimal) . 

MCHREMCH 728 (2D8) 

A one- word pointer to the damage 
assessment area. 
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MCHRPSW (see REMCOPSW) 



MCHSHUT 304 (130) 

A word containing an address used by 
the Emergency Recorder to return to 
the SHUT routine. 

MCHSIRB 312 (138) 

A pointer to the SIRB. 

MCHSPARE 376 (178) 

Reserved for future use. 

MCHSTEXT 434 (1B2) 

The starting CCHH of the extent of the 
data set being referenced. 

MCHSUBA 360 (168) 

This field denotes the subsystems run- 
ning under the operating system. TSO 
is indicated by hexadecimal 80. 

MCHSUBF 362 (16A) 

This field denotes subsystems for 
which no MCH support is provided. 

MCHSUBP 364 (16C) 

A pointer to the subsystem data area 
(MCHSUB) . 

MCHSVCBL 372 (174) 

A pointer to the SVCLIB BLDL table. 

MCHTCB 296 (128) 

One word used by the PDAR modules to 
store the address of the current TCB. 

MCHTTRIN 300 (12C) 

One word containing the TTR of the 
next transient module to be loaded. 
The TTR must be converted to an abso- 
lute address. 

MCHTTRS 96 (60) 

A one-word field containing a pointer 
to the field containing the TTRs, IDs r 
and displacements into the successor 
table of all MCH transient modules. 

MCHUCB 396 (18C) 

A one-word field containing the UCB 
address for I/O routines. 

MCHWORK 596 (254) 

A 120 -byte work area for the error 
recorder. 

MCH1STDS 357 (165) 

A one-byte field containing the dis- 
placement to the successor ID's of 
IGFMCHE0 for scheduling of the first 
successor module. 

MCH2NDRY 332 (14C) 

A pointer to the second MCH Indepen- 
dent Common Area. 



REMCOPSW 72 (48) 

A doubleword for saving the machine 
check old PSW. 

REPDAR 80 (50) 

A 16-byte field containing the job 
name and step name of the interrupted 
program. 

REPDARF1 60 (3C) 

One word containing the starting 
address of a failing location. 

REPDARF2 64 (40) 

A one-word location containing the end 
of a failing location. 

REPDARI 68 (44) 

A one- word location containing the 
instruction address at the time of the 
failure. 

REPDARI 52 (34) 

A one-byte field containing action (s) 
taken by PDAR for a specific failure. 

REPDAR2 53 (35) 

A continuation of REPDARI. 

REPDAR3 54 (36) 

A one-byte field describing operating 
system status. 

REPDAR4 55 (37) 

A one- byte field used by PDAR to indi- 
cate error location. 

REPDAR5 56 (38) 

A one-byte field indicating instruc- 
tion location at the time of the 
error. 

REPDAR6 57 (39) 

A one-byte field indicating the sched- 
uling of messages to the operator 
denoting action taken by PDAR modules. 

REPDAR7 58 (3A) 

A reserved byte; not used 

REPDAR8 59 (3B) 

A reserved byte; not used 

Figure 28 describes the contents of 
REPDARI through REPDAR8. 



RECORD BUFFER BUILD AREA 

The Record Buffer Build Area occupies 80 
bytes following the Independent Common Area 
(see Figure 3) . The first 2 bytes are 
unused except for boundary alignment. The 
next 8 bytes, labeled CTFIELD, contain con- 
trol information; and the last 70 bytes 
contain the short form of the MCH record, 
called the ABREC. 



Section 4: MCH Data Areas 109 



REPDAR1 



PDAR Action 



Bit 

I 





j Content 
I 1 



Indicates 
Termination necessary 



2 

h 

3 

I 

4 



I 



I 



Repair/retry failed 






Retry possible (Unused) 



Indeterminate instruction counter 



+ 



Instruction involved 



5 

I- 

6 

h 

7 



I 



Operand involved 



System wait if Refresh/repair/retry fails 
Inhibit termination (Unused) 






REPDAR2 



Bit j Content 



PDAR Action 
Indicates 







1 
2 
3 



I 1 
I 1 



+ . 



Solid storage data error 
Intermittent storage data error 
Solid SPF key error 



H- 



Intermittent SPF key error 



Refresh/repair successful 



Storage data error location cleared 



H- 



X 

REPDAR3 



Storage block failure 
Storage unit failure 



Operating System Status 






Bit j Content 



Indicates 





t 

1 
^ 

2 

I 

3 

t 

4 

I- 

5-7 

I 



Wait pseudo task 



I 1 



Master scheduler task 
System task 






Problem program task 



I 

~L 



Current PSW disabled for I/O and external interrupts 
Unused 



REPDAR4 



Location of Error 



H- 



Bit 



1 

2 

Figure 



j Content 



Indicates error in 



Nucleus 



4- 

I 
~4~ 



I 



l 
l 



SVC transient area 



28. 



Error transient area 
PDAR control and action bytes (Part 1 of 2) 
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REPDAR4 (Cont'd) 



Location of Error 



,. + 

_5 _ I 1__ 

6 | 1 
j. + 



Refreshable nucleus CSECT 
A 

Dynamic area 



Link pack area 
Resident type III SVC 






BLDL table 



k- 



REPDAR5 



Location of Instruction When Failing Address Involves Operand 



Bit 



Content 



Indicates instruction is in 



h- 



Nucleus 



Dynamic area 



I— 



Link pack area 



3-7 | 



Reserved for future use 



REPDAR6 



Messages 



Bit 



Content 



Meaning 
Schedule unrecoverable supervisor error message 



1 



Schedule unretryable supervisor error message 



Schedule unrecoverable error in dynamic area message 



Schedule task ABEND message 






Schedule LINKLIB BLDL deleted and task ABEND message 



6-7 | 



Schedule SVC BLDL deleted and task ABEND message 
Unused 



■+-- 



REPDAR7 

and 
REPDAR8 



Reserved for future use 



Figure 28. PDAR control and action bytes (Part 2 of 2) 



When writing a short record, MCH moves 
the current ABREC from MCHABREC of the 
Independent Common Area to the 70 bytes 
reserved for it in the Record Buffer Build 
Area. MCH then writes the short record 
from this buffer to the SYS1.LOGREC. 

When writing a full record, MCH incre- 
ments its pointer to CTFIELD by 22 bytes 
and moves the first 48 bytes of the current 
ABREC from MCHABREC to the area following 
the new CTFIELD location. MCH can then 
write a full record from a contiguous area 
containing the: 

1. First 48 bytes of ABREC 

2. Fixed Logout 



3. Extended Logout 



4. Damage Assessment Field Buffer Area 



The contents of the short MCH record or 
ABREC are shown in Figure 29. The 48 bytes 
of ABREC used in writing a long MCH record 
are Key through Machine-Check Interruption 
Code. All of the ABREC is used when writ- 
ing a short MCH record. 

MCHBUILD in the Independent Common Area 
points to CTFIELD in the Record Buffer 
Build Area. CTFIELD is filled in by MCH 
recorders, IGFMCHE2 and IGFMCHE3, and 
contains: 
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Dec Hex r — 




8 8 



12 



16 10 



24 18 



32 20 



40 28 



48 30 



56 


38 


64 


40 


68 


48 



Key 



| OS Release Number j 
-X x 

Reserved 4 



Switches 



Date 



Time 



CPU ID 8 



Program ID 8 



Job ID 8 



Machine-Check Old PSW 8 



Machine-Check Interruption Code 8 



Damage Assessment Data 8 



~H 



1 



-H 



Lost CPU ABEND | Lost CPU Recovered j 
Counter | Counter j 



Lost CPU | 
Soft Counter 



Lost ECC ABEND 
Counter 



Lost ECC Recovered 
Counter 



Lost ECC Soft 
Counter 



I Flag | 

| (See Figure 22) | 

-X x. 



Reserved 



Figure 29- Fields of ABREC 



1. The CCHHR (5 bytes) of the next record 
entry available on SYS1.LOGREC. 

2. The key length (1 byte) of the MCH 
record. 

3. The data length (2 bytes) of the MCH 
record. 

Note ; It is possible that there not be an 
Extended Logout. In this case, MCH sets up 
the Damage Assessment Field Buffer Area to 
immediately following the Fixed Logout. 

Figure 30 shows the possible kinds of 
MCH error records that can be written. 



FIXED LOGOUT 

The Fixed Logout occupies main storage 
locations 176 through 511 (decimal) . When 
MCH is first entered, it moves Fixed Logout 
data from locations 232 through 511 to the 



area immediately following the ABREC in the 
Record Buffer Build Area (see Figure 3). 

Fixed Logout locations 176 through 231 
are not used by MCH to create the full MCH 
record. 

Figure 31 shows the contents of the 
Fixed Logout in storage locations 176-511. 



EXTENDED LOGOUT 

Producing an Extended Logout is 
attempted for all machine- check interrup- 
tions when allowed by the Machine-Check 
Extended Logout Mask in control register 14 
(when bit 1=1). The Extended Logout infor- 
mation (see Figures 32 and 33) is placed 
into one of the following main storage 
locations : 

• On the Model 145, into the location 
addressed by the Machine-Check Extended 



112 



Logout Pointer contained in control 
register 15. 



• On the Model 135, into a save area 
within the Fixed Logout beginning at 
decimal location 256. 

See Figure 3 for the location of the 
expended logout. 



Extended Logout for the Model 115 ; The 112 
bytes of the extended logout shown in 
Figure 32 are positioned in the first 112 
bytes of the 19 2- byte area reserved for the 
extended logout. The last 80 bytes of this 
area are unused. When an extended logout 
is not produced by the hardware, the Damage 
Assessment Field Buffer Area immediately 
follows the Fixed Logout in the main 
storage locations used to build a full MCH 
record. 



Extended Logout for the Model 135 : The 
extended logout, shown in Figure 33, occu- 
pies 14 bytes of a scratch area within the 
Fixed Logout of the Model 135. Also within 
this scratch area, following the MCH 
extended logout and a six- byte reserved 
area, is a CCH logout of 4 bytes. Since 
the extended logout is completely contained 
within the Fixed Logout, the Damage Assess- 
ment Field Buffer Area immediately follows 
the Fixed Logout in main storage locations 
used to build a full MCH record. 



SUBSYSTEM DATA AREA (MODEL 145 ONLY) 

The Subsystem Data Area occupies 64 
bytes of main storage following the Damage 
Assessment Field Buffer Area (see Figure 
3) . When Subsystem Analysis (IGFMCHF6) 
determines that the error occurred in TSO, 
it stores information in the Subsystem Data 
Area for use by TSO and passes control to a 
TSO Analysis Module IGFMCH91. 

See the IBM System/360 Operating System 
Time Sharing Option (TSO) Control Program , 
GY27-7199, for further information. 



MACHINE STATUS BLOCK 

The Machine Status Block (MSB) is a sys- 
tem control block used by the Soft Machine- 
Check Handler and the Machine Status Con- 
trol routines for recording and mode 
switching. 

The MSB is initialized during NIP pro- 
cessing by IGFMCHFO (MCH Initialization). 
The CVTRMS field of the Communications Vec- 
tor Table in the resident nucleus contains 
the address of the MSB. 

The fields of the MSB differ for the 
Models 135 and 145. Below are the fields 
of the MSB for each model. Displacements 
from the address in the CVTRMS field are 
shown as decimal (hexadecimal) . 

Model 135 Machine Status Block 



DAMAGE ASSESSMENT FIELD BUFFER AREA 

This field of the MCH record occupies 74 
bytes of main storage immediately following 
the Extended Logout. If there is no 
Extended Logout, this field follows the 
Fixed Logout*. Figure 13 shows the damage 
assessment data in this area (beginning 
with byte 520 of the MCH error record) . 

MCH moves data into this portion of the 
MCH error record from the Independent Com- 
mon Area. Figure 13 indicates which por- 
tions of the Independent Common Area are 
used. 



*Since the Extended Logout of the Model 
145, even when present, consists of only 
112 bytes of data, the Damage Assessment 
Field immediately follows that 112 bytes 
of data rather than following the entire 
19 2 -byte area reserved for the Extended 
Logout. However, an entire 74 bytes of 
main storage is reserved for the Damage 
Assessment Field following the 192 bytes 
for the Extended Logout. 



MSBCOUNT 8 (8) 

One word used as a counter for soft 
errors . 

MSBCR14 12 (C) 

One word used to store control regis- 
ter 14. 

MSBHDCPY 28 (1C) 

One word used as a pointer to the UCB 
of any display device attached to the 
Model 135. 

MSBMCW (0) 

Two words containing the status of the 
Model 135. 

MSBMODE20 (14) 

One byte used to indicate whether the 
Model 135 is in recording or quiet 
mode. 

MSBMSCON 24 (18) 

One word used as a pointer to the UCB 
of the master console. 

MSBTHRLD 16 (10) 

One word used to hold the soft error 
threshold value. 
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Model 145 Machine Status Block 

MSBHDCPY 44 (2C) 

One word used as the pointer to the 
UCB of any display device attached to 
the Model 145. 

MSBMSCON 40 (28) 

One word used as a pointer to the UCB 
of the master console. 



Oxxx xxxx Main storage in quiet mode. 

txxx xxxx Main storage in record 
mode. 

xlxx xxxx Control storage in record 
mode . 

xxlx xxxx Control storage in quiet 
mode. 

xOOx xxxx Control storage in thre- 
shold mode. 



MSBNODSW (0) 

A byte of switches used as follows; 



MCHBUILD 



MCH 

RECORD 

BUFFER 




, MCH IN LOG 



ABREC BUFFER 



MACHINE INDEPENDENT 
LOGOUT SAVE AREA 



Area - MCH records are built here before writing them to SYS1.LOGREC. 



-CONTROL REGISTER 15 
(MODEL 145 ONLY) 



MCH EXTENDED LOGOUT ** 



DAMAGE 

ASSESSMENT 

FIELD 



MCH 
LONG 
RECORD 
(MODEL 145 
ONLY) 



ABREC ' 



MACHINE 

INDEPENDENT 

LOGOUT 



MCH EXTENDED LOGOUT 



DAMAGE 

ASSESSMENT 

FIELD 



MCH 

LONG 

RECORD 



ABREC ■■ 



When there is not an Extended Logout. 



MCH 

SHORT 

RECORD 



ABREC 
(70 BYTES) 



MACHINE 

INDEPENDENT 

LOGOUT 



DAMAGE 

ASSESSMENT 

FIELD 



* Only 48 bytes of ABREC are recorded for MCH LONG RECORD. 
** Extended Logout is contained within the Independent Logout for the Model 135. 

Figure 30. Possible MCH error records 
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Dec Hex r 

176 BO 



180 B4 
184 B8 
188 BC 



232 E8 

240 FO 

248 F8 

252 FC 

256 100 



I- 



Machine Check Interrupt Code 
Reserved 



352 160 



384 180 



448 ICO 



ECSW 
Reserved (Channel ID) 
I/O Extended Log Pointer 



Reserved 



1 

1 



H 



Failing-Storage Address 






ECC Bits 



h 



I 



H 



Control Word Address (Region Code) 
Fixed Logout Area 



Floating-Point Register 
Save Area 









General Register 
Save Area 






Control Register 
Save Area 



508 1FC «- 

Figure 31. Fields of the fixed logout 
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Dec Hex r 




24 
28 



12 C 
16 10 
20 14 



18 
1C 



32 20 



36 24 



40 


28 


44 


2C 


48 


30 


52 


34 


56 


38 


60 


3C 


64 


40 


68 


44 


72 


48 


76 


4C 


80 


50 


84 


54 


88 


58 


92 


5C 


96 


60 


.00 


64 


.04 


68 


.08 


6C 



Retry Counts 



Machine Check Register A 
Machine Check Register B 



ABRTY Register 
SPTLB Register 



HMRTY Register 



CPURTY Register 
Control Word 






System Register 



I Register of expanded local storage 



U Register of expanded local storage 
W Register of expanded local storage 
V Register of expanded local storage 



X Register of local storage 
R Register of local storage 
Y Register of local storage 
Q Register of local storage 
IBU Register of expanded local storage 



TR Register of expanded local storage 
SPARE 



SN Register of expanded local storage 



PN Register of expanded local storage 
WK Register of expanded local storage 



NP Register of expanded local storage 



DM Register of local storage 



DW Register of local storage 






CPU Register (Mode Register) 
PSWCTL Register 






Figure 32. Fields of the extended logout for the Model 145 
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Displacements within the Fixed Logout 
Dec Hex 



256 
260 



100| 
1041 



CPU Checks 
BAR 



I 
I 



CPU Checks 1 



264 1081 



I 

T T ^ 

Zone in | 0-0 f s* |SAR** 
Error | | 

SAR (continued)*** | Retry | Retry |Retry 

| Threshold |Flag | Count 
+ x x 

Interrupt Status | Reserved 
Latches | 

Reserved 

T T r 1 

276 114|ICA Check | Select | IFA Check | Select 
j Byte J Channel | Byte | Channel 
I | Checks 2 | | Checks 3 



268 10CI 



272 110 I 



[ + x x 1 



A 



< 4 bytes — 



* 5 bits in length 
** 3 bits (Total SAR field is 19 bits long) 
*** 16 bits 

Figure 33. Fields of the Extended Logout for the Model 135 
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SECTION 5: DIAGNOSTIC AIDS 



This section is intended to assist in 
diagnosing problems in the Machine-Check 
Handler program. Included are: register 
conventions, common machine-check interrup- 
tion codes, and possible problems that may 
exist when the "unexpected error" message 
appears . 



REGISTER CONVENTIONS 

Figure 34 shows how MCH uses its regis- 
ters. There are two modules that are 
exceptions to these conventions; the Error 
Recorder (IGFMCHE2) and Console Write (IGF- 
MCHE1) . Since these modules operate more 
as part of normal operating system proces- 
sing rather than MCH processing, they fol- 
low the operating system's conventions. 



COMMON INTERRUPTION CODE SETTINGS 

Figure 35 illustrates the machine- check 
interruption code. 

More than one error can be presented in 
the interruption code. For example, the SD 
and PD bits may both be on. In this case 
MCH would handle the most serious 
error — SD. 

Interruption codes in which no error 
type is indicated (bits through 8 are 
zero) are considered invalid by MCH and 
cause a disabled wait state and message 
IGF015W. 



Register 

t 

0-9 



10 



11 



I 

12 
13 



h 



14 



I- 



Used by MCH as 
Work registers 



Pointer to the Communications 
Vector Table 



Pointer to the MCH Common Area 



Nucleus base register 

Used to hold the address of 
the save area in the I/O 
interface 



1 



Contains the return address 
into the MCH Nucleus from 
transient modules 



+ ., 

15 | Transient module base register | 
L x j 

Figure 34. Register conventions 



UNEXPECTED ERRORS 

When the Unexpected Error message 
appears, the following can be done to iso- 
late the cause of the error: 

1. Check the interruption code for vali- 
dity (see Figure 35) . If the inter- 
ruption code is invalid, the error was 
caused by a hardware malfunction. 

2. Check to see if the Fixed Logout 
represents the same machine check as 
the Extended Logout. 

3. Check the storage dump to see if a 
program check occurred. If a program 
check has occurred, and the instruc- 
tion address portion of the program 
check new PSW is the same as the 
instruction address portion of the 
machine check new PSW, the probable 
cause of error is a program check in 
the Machine-Check Handler. The His- 
tory Table in the MCH Common Area 
(MCHHISTY) can then be checked to 
determine in which MCH module the pro- 
gram check occurred. 



MCH HISTORY TABLE 

The MCH History Table (MCHHISTY in the 
MCH Common Area) can be used to determine 
which modules have been executed since the 
time of the machine- check interruption. 
The table also shows the sequence in which 
they were executed. The modules are iden- 
tified by their IDs and level numbers. 

When MCH is entered, the MCH Nucleus 
places its own ID and level number in the 
last two bytes of the MCH History Table. 
When a successor module is specified, the 
module loader subroutine in the Nucleus 
moves the data in the table two positions 
(bytes) toward the beginning of the table. 
The ID of the successor module is then put 
into the next to last byte in the table and 
hexadecimal "FF" is put into the last byte 
of the table. When the successor module is 
successfully loaded and given control, it 
places its level number into the last byte 
of the table, overlaying the X f FF f and 
signaling that it has received control. 

As each module is loaded by the module 
loader, this process is repeated such that 
the most recently executing module has its 
ID and level number in the last two bytes 
of the MCH History Table; the next most 
recently executed MCH module has its ID and 
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S P S T C 
D D R D D 




B D 


S S P 
E C E 




W M P 1 F R 
P S M A A C 




F G C L S 
P R R G T 



14 16 



20 



27 



31 





Machine-Check Extended Logout Length 



32 



48 



63 



0-4 

14-15 

16-18 

20-25, 27-31 

5-13, 19, 26, 32-47 



Subclass 

Time of interruption (tense) 

Storage errors 

Validity 

Not assigned, stored as zero 



O- 



()- 



<&- 



6- 



a 



(B- 



e- 



w 



13 



SPARE 



<5- 



I 



-6- 



20 



&-& 



<>-£> 



e *)-$> 



o- 



<»-o-o-o 



-o- 



-<>- 



o-o-<>-^ 



O () C) G -& 



<)-<>-£>-£>-<> 



th-ih<>-i)-t)-& O C) O t h-& 



<>-<)-«}— C^-O-O $)-<>-&-&-& 



O CD 6 6 CD CD 



31 



&-<>-<)-<> 



o o o o -a 



ERROR 



Multiple Bit 

Storage 

Error 



Retry 
Failed 



SPF Key 
Failure 



System 
Damage 



ECC 
Corrected 



CPU Retry 
Corrected 



Timer Error 
(Loc 80) 



Time of Day 
clock error 



INTERRUPT 
CODE 



40008C9E 



40000C1F 



40002C9F 



800000 IE 



20004FDF 



20000F1F 



10000FDF 



08000FDF 



Figure 35. Sample machine-check interruption codes for the Model 145 
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level number in the previous two bytes of 
the table, etc. 

Figure 36 shows the use of the MCH His- 
tory Table in recording the following 
sequence of module execution: 

1. MCH Nucleus - IGFMCHEO 

2. Preliminary Error Analysis - IGFMCH41 

3. MFT System Analysis 1 - IGFMFTF1 

4. Soft Machine-Check Handler - IGFMCH40 

5. Emergency Recorder - IGFMCHE3 



MCHHISTY ID 



LEVEL ID 



LEVEL 





4 4 

8 8 

C 12 

10 16 

14 20 

18 24 



f + + + 

till 

|. + + + 



,. + + + 

l_ I I 

| | E | 1 
+ + + 

U | 01 | Fl| 01 
| + + + 

4 | 1 | E 3 | F F 
L X JL ± 

Figure 36. MCH history table 



28 bytes 



120 



SECTION 6: MCH MODULE DIRECTORY 



The module directory is a guide to named areas of code in the program listing. The 
module names are listed in the table below in alphabetic order. The other columns con- 
tain the module f s descriptive name, major functions, entry point, and library residence. 
The module name also serves as a flowchart identification and a microfiche reference. 

Note : The name within parentheses (in the Module/CSECT name column) identifies the 
module as it is cataloged in SYS1.SVCLIB. If there is no name in parentheses for an 
entry, the regular module name identifies the module in SYS1.SVCLIB. 



Module/ 
CSECT 
Name 
|. + . 

IGFMCHEO 



I 

IGFMCHE1 
(IGCR207B) 



I- 

IGFMCHE2 
(IGCR107B) 

^ + 

IGFMCHE3 



Module Name/Major Functions 



MCH Nucleus 
Initializes MCH. 

Handles unexpected interruptions. 
Interfaces with IOS for loading operations • 



+ 



Console Write 
Interface with system WTO routine. 



+ 



Entry 
Point 



— + 



IGFMCHEO 



IGFMCHE1 



+ 



Library 



SYS1.LINKLIB 



SYS1.SVCLIB 



Error Recorder 
Write error records to SYS1.L0GREC. 

Emergency Recorder 
Writes error records when the system is unable 
to continue. 



IGFMCHE2 
IGFMCHE3 



SYS1.SVCLIB 
SYS1.SVCLIB 



SYS1.LINKLIB 



IGFMCHFO | MCH Initialization I IGFMCHFO 

Loads and initializes the MCH Nucleus during 
I PL/NIP. 



IGFMCHF1 



I— 



PDAR 1 
Analyzes program damage and passes control 
to appropriate PDAR module. 



IGFMCHF1 
IGFMVTF1 
IGFMFTF1 



SYS1.SVCLIB 
SYS1.SVCLIB 
SYS1.SVCLIB 



-+ 



IGFMCHF2 



PDAR 2 
Determines which part of the system has been 
affected by intermittent main storage errors 
and indicates appropriate action. 



IGFMCHF3 



t- + 

IGFMCHF5 



t- + 

IGFMCHF6 



PDAR 3 
Determines which part of the system has been 
affected by solid main storage errors or 
SPF key errors and indicates appropriate 
action. 



IGFMCHF2 
IGFMVTF2 
IGFMFTF2 

+ 

IGFMCHF3 
IGFMVTF3 
IGFMFTF3 



SYS1.SVCLIB 
SYS1.SVCLIB 
SYS1.SVCLIB 

SYS1.SVCLIB 
SYS1.SVCLIB 
SYS1.SVCLIB 



PDAR Terminator 
Prepares the task to be either abnormally 
terminated or set nondi spat enable. 






IGFMCHF5 



SYS1 . SVCLIB 



TSO Subsystem Analysis 
To determine whether the machine check occurred 
in the subsystem area, and to terminate or 
assist recovery of the subsystem as 
circumstances warrant. 



+- 



IGFMCHF6 



SYS1. SVCLIB 
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i r 

Module/ 

CSECT 

Name 



IGFMCH40 



Module Name/Major Functions 



Soft Machine-Check Handler (145) 
Contains mode-handling function. 
Prepares recovery report for SYS1.L0GREC 

data set. 
Terminates MCH. 



Entry 
Point 



IGFMCH40 



- + i 



Library 



SYS1.SVCLIB 



IGFMCH41 



t + 

IGFMCH50 



Preliminary Error Analysis 
Determines the recovery strategy for MCH based 
on the interruption code and MCH Common Area. 



IGFMCHU1 



SYSl.SVCLIB 



Soft Machine-Check Handler (135) 
Contains mode- handling function. 
Prepares recovery report for SYS1.L0GREC 

data set. 
Terminates MCH. 



IGFMCH50 



"+ 



SYSl.SVCLIB 



I- +- 



IGF13501 



IGF29701 



Machine Status Control (135) 
Permits the operator to control the mode of 
recording soft machine-check interruptions and 
to display the current mode status. 

Machine Status Control (145) 
Permits operator to control the mode Of 
recording soft machine- check interruptions in 
main and control storage. 



+ 



IGF13501 



+ 

SYSl.SVCLIB 



IGF29701 I SYSl.SVCLIB 
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APPENDIX A; MCH MESSAGE TABLE AND WAIT STATE CODES 



Figure 37 lists the messages that are produced by MCH and the module that issues each 
message. The code, where shown, is a wait state code informing the operator of an error 
condition that caused the system to be placed in a wait state. 



Message 



h- 



No Message - An MCI occurred while MCH was handling 
an MCI in MCH. 



"T r t 1 

| Code | Scheduled By | Issued By | 
+ 4 



IGF001W/S 



B } aaaaaa 
lU< 



A02 

+ 

A06 



IGF002W/S SUPVR MC 



A05 



IGFMCHF5 



IGFMCHEO 



IGFMCHF5 
IGFMCHE3 



IGFMCHEO 



-H 



IGF004W [S] PGM CHK [CPU ID 



(IP 



A03 



IGFMCHEO 



IGFMCHEO 



; IGF006W/S I/O ERR C 
IGF012W/S MCI-CCH C 



IGF013W/S CHAN ERR C 



A04 

+ + 

AOB 
+ + 

AOA 






IGFMCHEO 
IGFMCHE3 
IGFMCHE3 



IGFMCHEO 
IGFMCHEO 
IGFMCHEO 



1 



h 



1 

IGFMCHEO 

IGFMCHE1 



IGF015W/S DNEX ERR 
IGF020I ABEND 
IGF023I LINK BLDL DLT 



A14 
+ f 






IGFMCHEO 
IGFMCHF5 
IGFMCHF5 



IGFMCHE1 



H 



IGF024I xxxxx SET NONDISPATCHABLE 



IGFMCHF5 



IGFMCHE1 



IGF040I (TSO ] SYS (NDISP) 
(TCAMJ JABENDJ 



IGFMCHF5 



IGF044I ECC SUCCESSFUL 



IGFMCH40 
IGFMCH50 



IGFMCHE1 

+ 

IGFMCHE1 



, 

IGFMCHE1 

IGFMCHE1 



IGF045I MULTIPLE BIT ERROR 



IGFMCH41 



IGF048I HIR SUCCESSFUL 



IGFMCH40 
IGFMCH50 



IGF050W/S TOD ERROR 



IGF051I HRT ERROR 



A16 
+ + 



IGFMCHEO 
IGFMCHEO 



+ 



H 



IGFMCHEO 
IGFMCHE1 



IGF053I MODE STATUS-ECC ( QUIET ] HIR f QUIET 

(RECORD) (RECORD 
COUNT-nn THRESHOLD 



— + 



IGF13501 



IGF13501 



J 



J 



Figure 37. MCH message table (Part 1 of 2) 
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Message 



j Code | Scheduled By 
-+ + 



j Issued By j 



IGF055I QUIET MODE 
IGF055I QUIET MODE ECC 

,. 

IGF055I QUIET MODE ECC, HIR 



IGF054E SYS1.LOGREC FULL 



IGF056I I/O ERROR IN RECORDER 






IGF057E DISK FORMAT ERROR 



IGFMCHE2 
IGFMCHE3 

IGFMCH40 

IGFMCH50 



„ + + „ 



IGFMCHE1 

IGFMCHE1 
IGFMCHE1 



IGFMCH40 
IGFMCH50 

IGFMCHE2 



IGFMCHE2 



+ 

IGFMCHE1 

IGFMCHE1 



IGFMCHE1 



IGF060E SYS1.LOGREC NEAR FULL 
IGF061I MODE SWITCH COMPLETE 









I IGF063I SVC BLDL DLT 






IGFMCHE2 

+ 

IGF29701 
IGF13501 

IGFMCHF5 



IGFMCHE1 



IGF 2 97 01 
IGF13501 



Figure 37 • MCH message table (Part 2 of 2) 



| IGFMCHE1 | 
-X J 
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APPENDIX B: MACHINE- CHECK - INTERRUPTION CODE 



The bit positions of the indicators in 
the machine-check interruption code are 
illustrated in Figure 35. The machine- 
check interruption code includes informa- 
tion about the type and severity of the 
error, the validity of the various fields 
that are stored, and the validity and 
length of the extended logout. The follow- 
ing describes the contents of the machine- 
check interruption code. 



Bit - SD 

System Damage — This bit is set 
whenever interruptions may have been 
lost or damage has occurred that can- 
not be isolated to one or more of the 
less severe machine check damage 
types, either internal or external. 

Bit 1 - PD 

Instruction Processing Damage — This 
bit is set when the extent of the 
damage is limited to an executed 
instruction or its associated 
operands . 

Bit 2 - SR 

System Recovery — This bit indicates 
that errors were detected but have 
been successfully recovered without 
loss of system integrity. See Bit 17. 



Bit 8 - W 



Warning — (Not used for Model 145.) 
Damage is impending to some part of 
the system; for example, loss of power 
or loss of cooling. 



Bits 9 through 13 - Reserved 



Bit 14 



B 



Backup — The backup bit indicates 
that the machine state at the point of 
interruption has been restored to a 
hardware checkpoint state prior to the 
occurrence of error; that is, the PSW, 
registers, and storage reflect a valid 
state either at the beginning of the 
instruction in error or some prior 
instruction. If the backup bit is 
zero, a valid instruction address 
points to an instruction beyond the 
error. 



Bit 15 



Delayed — This bit indicates that 
some or all of the information stored 
as a result of this interruption was 
delayed in being reported because the 
interruption type was masked off for 
the duration of one or more 
instructions . 



Bit 16 - SE 



Bit 3 



TD 



Timer Damage — Damage has occurred to 
either the timer or to location 80. 

Bit 4 - CD 



Storage Error Uncorrected — Indicates 
that a reference to storage resulted 
in the detection of damaged data that 
could not be corrected. 



Time- of- Day Clock Damage — Damage has 
occurred to the time-of-day clock. 

Bit 5 - ED 

External Damage — (Not used for Model 
145.) Indicates that a channel, chan- 
nel controller, switching unit or 
other unit external to the CPU or to a 
storage unit has been damaged during 
operations not directly associated 
with the CPU. ED is used to report 
damage of this type only when the more 
conventional reporting procedures, 
such as I/O interruption, are unavail- 
able or are impractical. 

Bit 6 - Reserved 



Bit 17 - SC 



Bit 7 - AC 



Automatic Configuration — (Not used 
for Model 145.) A buffer page in the 
CPU has been disabled by the hardware. 
Operations will continue, but with 
decreased performance. 



Storage Error Corrected ~ Indicates 
that a reference to storage resulted 
in the detection of an error that was 
subsequently corrected. Bits 2 and 17 
are set on when the frequency of error 
corrections for single bit errors 
exceeds the limit set by hardware 
(Error Frequency Limit Overflow) . 

Bit 18 - PE 

Protection Storage Error Uncorrected 
— Indicates that a reference to the 
Storage Protection Key resulted in the 
detection of an uncorrectable error in 
the key in storage. The keys in 
storage are not checked for errors 
during storage references when the PSW 
key is zero. 

Bit 19 - Reserved 

Bit 20 - WP 

PSW Validity — Indicates that bits 
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12-15 of the machine check old PSW are 
valid. 



Bit 21 - MS 



PSW Masks and Key Validity — Indi- 
cates that all PSW bits other than 
Interruption Code, ILC f AMWP, lA f cc, 
and Program Mask of the machine check 
old PSW are valid. 



Bit 22 - PM 



Program Mask and Condition Code Vali- 
dity — Indicates that the program 
mask and condition code in the machine 
check old PSW are valid. 



Bit 23 - IA 



Instruction Address Validity — Indi- 
cates that the instruction address in 
the machine check old PSW accurately 
reflects the point in the instruction 
sequence at which the interruption 
occurred. Note that the instruction 
location at interruption and the 
instruction location at the time of 
the error may not be the same. If 
backup has been indicated, a valid 
instruction address will point to the 
instruction in error or prior to the 
error. If backup is not indicated a 
valid instruction address will point 
to an instruction following the error. 



Indicates that the contents stored in 
the floating-point register save area 
are the same as the contents of the 
registers at the point of 
int errupt ion . 



Bit 28 - GR 



General Registers Valid — Indicates 
that the contents stored in the gener- 
al register save area are the same as 
the contents of the registers at the 
point of interruption. 



Bit 29 



CR 



Control Register Validity — Indicates 
that the contents stored in the con- 
trol register save area accurately 
reflect the condition of the control 
registers at the time of interruption. 



Bit 30 - LG 



Bit 24 - FA 



Failing- Storage Address Valid — Indi- 
cates that the failing- storage address 
is valid. 



Bit 25 - RC 



Region Code Valid — (Not used for the 
Model 135.) Indicates that a valid 
region code has been stored. 



Bit 27 - FP 



Floating-Point Registers Valid — 



Log Valid — (Not used for the Model 
135.) Indicates that the CPU extended 
log information was correctly stored. 

Bit 31 - ST 

Storage Logical Validity — Indicates 
that the contents of those storage 
locations that are modified by execu- 
tion were restored to their contents 
at the point of interruption. 

Bits 32 through 47 - Reserved 

Bits 48 through 63 

CPU Extended Log Length — (Not used 
for the Model 135.) This field indi- 
cates the length in bytes of the 
information stored in the extended log 
area, starting at the location speci- 
fied by the CPU extended log pointer 
in control register 15. On a machine- 
check interruption when no logout 
occurs, this field is set to zero. 
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INDEX 



Indexes to program logic manuals are 
consolidated in the publication IBM System/ 
360 Operating System: Program Logic Manual 
Master Index , GY28-6717. For additional 
information about any subject listed below, 
refer to other publications listed for the 
same subject in the Master Index. 



damage assessment 19 

field buffer area 113 
damage report (hard machine- check 

interruption) 1 
degraded state of operation 14 
dependent common area 104 
disabled for interruptions 10-14 



Where more than one reference is given, 
the major reference is first. 



20,109,111-112 



abbreviated record (ABREC) 
ABEND 6,57-58 

Abnormal End Appendage routine 50 
ABREC (abbreviated error 
record) 20,109,111-112 
allocation 

of auxiliary storage 7-8 

of data areas by NIP 47-48 
altering PSWs 

I/O new 17 

machine-check new 14,19 

program new 14 
analysis of the hardware error 
assessment of damage 19 
automatic recovery features 1 
auxiliary storage reguirements 7-8 



buffer build area, record 109,111-112 
buffering and formatting 51-53 



CCH, interface to and recording for 23-24 
Channel-Check Handler, interface 23-24 
channel error record 24 
clock, time-of-day 17 
codes, machine-check 

interruption 17,125-126 
common area 104-109,7,14 
communications 10 

area 3,10 

between modules 8,10 
Console Write routine, IGFMCHE1 

flowchart 98 

module description 59 
control, passing of by transient 

modules 18 , 8 , 9 , 14, 17 
control module 7 
control register 14 3 
control register 15 3,7 
control registers 3 
control storage malfunctions 5,17 
CPU 

malfunctions 4,5 

retry 1,10,17,19 
successful 17 
CTFIELD 109 



ECC (Error Checking and 
Correction) 1,3,10,17 

successful 17 

validity checking 3 
Emergency Recorder, IGFMCHE3 

flowchart 99-101 

module description 59 
emergency recorder module (IGFMCHE3) 59 
emergency recording 20-23 
enable hard errors 14 
error-on-error conditions 10-14 
error record 

CCH 24 

MCH 114,21-22 
Error Recorder, IGFMCHE2 

flowchart 94-97 

module description 58-59 
error recording 20-23 
error recovery 

levels of 6-7 

types 1,6-7 
errors, unexpected 118 
exercising a location 19 
extended logout 112-113,3,7,10,14,17 

mask bit 3 



fixed area (Permanent Storage 

Allocation) 3 
fixed logout 112,7,10,14 
fixed storage locations 3 
flowcharts 61-103 
formatting the error record 20-23 



hard errors 1,3-4,11,14 

hard machine check 1,3-4,11,14 

hard stop 14 

hardware 

error analysis 17-19 

malfunctions, types 17 

recovery features of Models 135 and 
145 1 
high- resolution timer 17 
history of executed modules 118-120 



IGFIORTN, Module Loader 49 
IGFLOAD, I/O Initialization 49 
IGFMCHE0, MCH Nucleus 

flowchart 64-72 

module description 48-50 
IGFMCHE1, Console Write routine 
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flowchart 98 

module description 59 
IGFMCHE2, Error Recorder 

flowchart 94-97 

module description 58-59 
IGFMCHE3, Emergency Recorder 

flowchart 99-101 

module description 59 
IGFMCHFO, MCH Initialization 

flowchart 62-63 

module description 47-48 
IGFMCHF1 (see IGFMVTF1 or IGFMFTF1) 
IGFMCHF2 (see IGFMVTF2 or IGFMFTF2) 
IGFMCHF3 (see IGFMVTF3 or IGFMFTF3) 
IGFMCHF5, PDAR Terminator 

flowchart 91-92 

module description 57-58 
IGFMCHF6, TSO Subsystem Analysis 

flowchart 93 

module description 58 
IGFMCH40, Model 145 Soft Machine-Check 
Handler 

flowchart 73-76 

module description 51-53 
IGFMCH41, Preliminary Error Analysis 

flowchart 81 

module description 53-54 
IGFMCH50, Model 135 Soft Machine-Check 
Handler 

flowchart 77-80 

module description 50-51 
IGFMFTF1, MFT System Analysis 1 

flowchart 87-88 

module description 56 
IGFMFTF2, MFT System Analysis 2 

flowchart 89 

module description 56-57 
IGFMFTF3, MFT System Analysis 3 

flowchart 90 

module description 57 
IFGMVTF1, MVT System Analysis 1 

flowchart 82-83 

module description 54 
IFGMVTF2 f MVT System Analysis 2 

flowchart 84-85 

module description 54-55 
IFGMVTF3, MVT System Analysis 3 

flowchart 86 

module description 55 
IGF13501, Model 135 Machine Status Control 

flowchart 103 

module description 59-60 
IGF29701, Model 145 Machine Status Control 

flowchart 102 

module description 59 
independent common area 104-109 
inhibit termination on 61 
inhibit termination requested 61 
initialization 

by MCH nucleus 10-17,47-48 

by NIP 47-48 
instruction 

processing damage 17-19 

retry 1,17 

unr etr yabl e 1,19 
interface with CCH 23 
intermittent errors 19 
interruption code, machine-check 125-126 



common settings 118-119 

location 3 

some major fields 17 
interruptions disabled 10-14 
I/O 

and module loading 9,14-18 

communications area 3 

Initialization, IGFLOAD 49 

interruption 17 

new PSW 17 

Supervisor 14-17,8 



job termination 6-7,19-20,50-53 



key, damage to SPF 17,19 



levels of error recovery 6-7 

Link Pack Area, error occurs in 19 

loading 

successor modules 8,9,14,17,18 
transient modules 8,9,14,17,18 

logic of MCH 10 

logout 10 

lost-record counter 52 



machine- check 

interruption code 125-126 
location of 3 
some major fields 17 
new PSW 14,19 
old PSW 10-14 
subclasses 17 
machine circuitry 10,12,13,17 
machine malfunctions 
sources of 1,2 , 17 
types of 17 
Machine Status Control (Model 135) , 
IGF13501 

flowchart 103 
module description 59-60 
Machine Status Control (Model 145), 
IGF29701 

flowchart 102 
module description 59 
main storage 

malfunctions 5 , 17, 19 
requirements 7 
malfunctions, types of hardware 17 
MCH 

common area 7,11 
error record 114,112,21-22 
error recovery 6 
history table 118-120 
independent common area 104-109 
Initialization, IGFMCHF0 
flowchart 62-63 
module description 47-48 
Nucleus, IGFMCHE0 
flowchart 64-72 
module description 48-50 
nucleus area 17,7 
resident area 7 
transient area 7,9,14,17,18 
MCHDEB, fields of 107 
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MCHINLOG 14 , 107 
MCHINTEL, fields of 107 
MCHIOB, fields of 108 
MCHLSUM, fields of 108 
MCHNXIDS table 49 , 108 
MCHTTRS table 49,109 
message table 123-124 
MFT System Analysis 1, IGFMFTF1 

flowchart 87-88 

module description 56 
MFT System Analysis 2, IGFMFTF2 

flowchart 89 

module description 56-57 
MFT System Analysis 3, IGFMFTF3 

flowchart 90 

module description 57 
MODE command 5-6 f 59-60 
mode, automatic switching of 5-6,4,50 
mode handling for the Model 135 50,5-6 
mode handling for the Model 145 51,6 
model- dependent common area 104 
modes, control of 3-5 
modes of recovery operation 4-5 
module 

directory 121-122 

loader 48-49,9,16-18 

loading 8,9,16-17,18 

scheduler 49 
multiple- bit storage errors 3,17,19 
multiple errors 17 
MVT System Analysis 1, IGFMVTF1 

flowchart 82-83 

module description 54 
MVT System Analysis 2, IGFMVTF2 

flowchart 84-85 

module description 
MVT System Analysis 3, 

flowchart 86 

module description 55 



54-55 
IGFMVTF3 



NIP (Nucleus Initialization Program) 

initialization of MCH 47-48 
Normal End Appendage routine 50 
normal recording procedures 19-20 
Nucleus, MCH (IGFMCHE0) 10-17,48-50 
nucleus area 7 



operation diagrams 25-46 
operator communications 5-6,10 
overlay routines, area for 7,8,9,14-17 
overlay structure of MCH 9 



parity checking 1-3 
PDAR 

control and action bytes 110-111 
modules 19 

(see also MFT and MVT System 
Analysis) 
Terminator, IGFMCHF5 
flowchart 91-92 
module description 57-58 
physical characteristics 7 
Preliminary Error Analysis (PEA), 
IGFMCH41 13 
flowchart 81 



module description 53-54,17 
priority of machine- check 

interruptions 10-14 
program- check new PSW 14 
program damage 

assessment and repair modules (PDAR) 19 

recovery 19 

procedures 19 
program termination 19-20,6 
PSA (Permanent Storage Allocation or fixed 

area) 3,14 
PSW 

altering 14,19 

I/O new altered 17 

saving the old 14 
purpose of the Machine-Check Handler 1 



quiet mode 3-6 



receiving control 10 

record buffer build area 109,112 

recording 

channel-check records 23-24 

emergency 20-23 

error records 20,21-22,24 

mode 3-4 
recording and termination 19-23 
recovery 6,19 

design for Models 135 and 145 1 

report (soft machine- check 
interruption) 1,8 
register 

control 3 

conventions 118 
repair, system 7,19 
resident area 7 
restart, system- supported 6 
retry 

CPU 1,17,19 

fails 1,19 

successful 17 

unsuccessful 17-19 
retry on 61 



saving the environment 14,47-48 

scheduling successor 
modules 48-49, 18, 14-17 

second errors 10-14 

sequence of executed modules 118-120 

short error record 114,112,20 

SHUT (Special Handler for Unusual Termina- 
tions) routine 14,48 

single-bit errors 1-6 

soft errors 27,1-6,10,12-13 

soft machine check 12-13 

Soft Machine-Check Handler (IGFMCH40) 
flowchart 73-76 
module description 51-53 

Soft Machine-Check Handler (IGFMCH50) 
flowchart 77-80 
module description 50-51 

solid errors 11,4,19 

solid, single-bit errors 1-4 

SPF key 17-19 

storage errors 17 
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Storage Protect Feature (SPF) 17,19 
storage requirements for MCH 7 
subclasses of machine- checks 17 
subsystem data area 113,7 
successor modules, scheduling and 

loading 8,9,14-17,18,48-50 
supervisor area, error occurs in 19 
switching modes, automatic 3-6 
system 

damage 17,6,19 

disabled for interruptions 10-14 

recovery 6-7,19 

repair 6-7,19 

termination 6-7 , 17 , 1 9 
System Analysis 

of MFT modules 56-57 

of MVT modules 54-55 
system-supported restart 6 
SYS1.LOGREC 7,19-20 
SYS1.SVCLIB 7 



termination, task and system 6-7,19 

termination necessary 61 

threshold mode 3-6 

Time of Day Clock Damage 17 

Timer Damage 17 

transient area 7,9,14-18 

transient modules 18,8-9,14-17 

TSO (Time Sharing Option) Subsystem 
Analysis, IGFMCHF6 
flowchart 93 
module description 58 

types 

of error recovery 1-3,6-7 
of hardware malfunctions 17 



unexpected errors 118 
unretryable instruction 17 
unsuccessful retry 17-19 



task termination 6,19 

TCB (Task Control Block), setting it 

nondi spa tchable 1 9 
tense 17 
Termination 46,53, 57- 58 



validity bits 17,125-126 



wait state 17,19 
codes 123-124 
warm start (system-supported restart) 
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IBM System/360 Operating System READER'S 

Machine Check Handler for the COMMENT 

IBM System/370 Models 135 and 145 FORM 

Order No. GY27-7237-1 



Your views about this publication may help improve its usefulness; this form 
will be sent to the author's department for appropriate action. Using this 
form to request system assistance or additional publications will delay response, 
however. For more direct handling of such requests, please contact your 
IBM representative or the IBM Branch Office serving your locality. 

How did you use this publication? 

D As an introduction D As a text (student) 

D As a reference manual D As a text (instructor) 

D For another purpose (explain) 



Please comment on the general usefulness of the book; suggest additions, deletions, and clarifications; list 
specific errors and omissions (give page numbers): 



What is your occupation? 

Number of latest Technical Newsletter (if any) concerning this publication: _ 
Please include your name and address in the space below if you wish a reply. 



Thank you for your cooperation. No postage stamp necessary if mailed in the U.S.A. (Elsewhere, an IBM office 
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form will be carefully reviewed by the persons responsible for writing and publishing 
this material. All comments and suggestions become the property of IBM. 
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